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Abstract. We give an ergodic theoretic proof of a theorem of Duke about 
equidistribution of closed geodesies on the modular surface. The proof is 
f*) ^ closely related to the work of Yu. Linnik and B. Skubenko, who in partic- 

ular proved this equidistribution under an additional congruence assumption 
on the discriminant. We give a more conceptual treatment using entropy the- 
ory, and show how to use positivity of the discriminant as a substitute for 
Linnik's congruence condition. 
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1. Introduction 

A non-zero integer d is called a discriminant if it can be represented in the form 

d = b 2 — 4ac, a, b, c 6 Z, 

or equivalently if d is the discriminant of the binary quadratic form with integral 
entries 

(1.1) q(x,y) = ax 2 + bxy + cy 2 . 



Date: February 2011. 

M. E. acknowledges the support by the Clay Mathematics Institute as a Research Scholar, by 
the NSF (grant 0554373) and the SNF (grant 200021-127145). E. L. acknowledges the support 
of NSF (grants DMS-0554345 and 0800345), the ISF (grant 983/09) and the European Research 
Council (Advanced Research Grant 267259). Ph. M. was partially supported by the SNF (grant 
200021-125291) and the the European Research Council (Advanced Research Grant 228304). 
A. V. was supported by the Clay Mathematics Institute and by NSF Grant DMS-0903110. 



1 



2 



M. EINSIEDLER, E. LINDENSTRAUSS, PH. MICHEL, AND A. VENKATESH 



It is easy to see that d is a discriminant if and only if d = 0, l(mod 4). A discriminant 
d is fundamental if d is either square-free (in which case d is congruent to 1 modulo 
4) or d/A is a square-free integer congruent to 2,3(mod 4). Equivalently: d is 
fundamental if it is the discriminant of the ring of integers of a quadratic field. 

The study of integral binary quadratic forms goes back at least to the Greeks. 
Significant breakthroughs were accomplished by Gauss. In his Disquitiones arith- 
meticae he studied the set of GL 2 (Z)-orbits of such forms, where GL 2 (Z) acts via 

/ ii v \ 

the linear change of variables: for 7 = G GL 2 (Z) 

\ w z J 

(1.2) i-q(x, y) = det ^ g((^, y)i) = ^^Z\ l( ux + W V> vx + z v)- 

This action preserves the discriminant and Gauss proved that the set of GL 2 (Z)- 
orbits of integral binary quadratic forms of a given discriminant is finite, see [7, pg. 
128] for an accessible and more general treatment. Let 

Rdisc(^) = {q(x, y) = a x 2 + bxy + cy 2 , a, b, c G Z, disc(gr) = d, gcd(a, 6, c) = 1} 

~ {(a, b, c) G Z 3 , disc(a, b, c) — b 2 — 4ac — d, gcd(a, b, c) = 1} 

denote the set of forms of discriminant d with coprime coefficients, and let 

[RdiscCO] - GL 2 (Z)\R disc (d) 

be the set of orbits; its cardinality is the class number and is noted h(d). Gauss also 
showed that the set [Rdisc(d)] could be given an additional structure of an abelian 
group (the law of composition of quadratic forms), leading to the notion of class 
group of quadratic forms of discriminant d. Nowadays these venerable and beautiful 
results are usually interpreted in terms of the theory of quadratic fields and ideal 
class groups. We will recall this connection below. 

1.1. Linnik and Skubenko equidistribution theorems. In the late 50's, Lin- 
nik studied more refined properties of the set of representations Rdi SC (c£), in partic- 
ular their distribution properties. 
Let 

Vdi«c±i(R) = {(«> 6 ' c ) e m3 ' 1,2 - Aac = ±l h 

this is a one-sheeted hyperboloid in the +1 case and a two-sheeted hyperboloid 
in the —1 case, and is identified with the set of real binary quadratic form with 
discriminant ±1. In both cases Vdisc,±i(K) is invariant under the natural action of 
GL 2 (M) extending (1.2) and has one orbit. 

The set of representation Rdisc(^) projects on Vdisc,±iW (with ±1 = sign(d)) 
by a homothety 

\d\- 1/2 R dlsc (d) C Vdiac.il W, 

and Linnik studied how this set is distributed when d —¥ 00. These hyperboloids 
carry a natural GL 2 (E)-invariant measure jLtdisc,ii defined, for any open set C 
Viisc.ii (K), as the Lebesgue measure in K 3 of the solid cone emanating from the 
origin and ending at CI, i.e. 

Mdisc,±(^) = MR3(C(0)) 

where 

C(0) = {r.x, xe!l, re [0,1]}. 
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Using an original argument of ergodic theoretic flavor, Linnik [19, Chap. V] 
established the following equidistribution statement for negative discriminants. 

Theorem 1.1 (Linnik). Let p > 2 be a fixed prime. As d — > — oo amongst the 
negative discriminants such that ( — ] = 1, the set 



Mr 1/2 Rdisc(d) C Vdisc,-i(M), 

becomes equidistributed with respect to ^di S c,-i, in the following sense: for any 
two continuous compactly supported functions <pi,(p2 on Vdisc,-i(IR) such that the 
integral /idisc,-i(v?2) ^ we have 

E x eR dlsc (d)Vi(\ d \~ 1/2x ) Mdisc-i^i) , 

/i ,i_i/ 2 \ -> 7 — r as d -> -oo. 

In particular, X)sgR d (d) f2(\d\ ~ 1 l 2 x) =/= if d as above is large enough. 

Building on Linnik's ergodic method Skubenko [24] (see also [19, Chap. VI.]) 
proved the analogous statement for positive discriminants: 

Theorem 1.2 (Skubenko). Let p > 2 be a fixed prime. As d — > +oo amongst the 
positive discriminants such that ( — ] — 1, the set 



Mr 1/2 Rdisc(d) C Fdisc.+iW, 

becomes equidistributed with respect to /idisc,+i ; in the following sense: for any 
two continuous compactly supported functions (fi,(f2 on Vdisc,+i(K) such that the 
integral /idisc,+i(v 3 2) ^ we have 

E. e R dlsc(C i)^i(Mr 1/2 ^) Mdisc.+i(^i) , 

ExeR^cd) <P2{m 1 x ) Mdisc,+i(<p 2 ) 

In particular, YlxeRd- (d) ^2(1^1 1 ^ 2 ^) ^ j/d as above is large enough. 

We refer to Figure 1 for an illustration of the case d — 377. 
fd\ 

The condition — =1 for some fixed prime p is equivalent to the condition 
\Pj 

that 

the fixed prime p splits in the quadratic field Q(Vd). 

This condition (which we shall refer to as Linnik 's condition) was an essential input 
for Linnik's ergodic method but, as was pointed out by Linnik himself, it should 
not be necessary for the equidistribution theorem to hold. It is only thirty years 
later that this condition was removed in the beautiful work of Duke [9] . 

1.2. Duke's theorem. A key point of Duke's approach is to reformulate the prior 
theorems in a dual form: in terms of equidistribution of "Heegner points" (for 
negative d) or of closed geodesies (for positive d) on the modular surface lo(l) := 
SL 2 (Z)\H. 
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Figure 1. The distribution of 377~ 1/2 R d i sc (377) viewed on the 
one-sheeted hyperboloid, note that h(377) = 1. 



Assuming that d is not a square, one associates to any (a, b, c) € Rdisc(rf) the 
geodesic corresponding to the geodesic semi-circle in the upper half plane whose 
end points are 

2a 

We lift this geodesic in the obvious way to the unit tangent bundle of EI and then 
project it to a geodesic orbit on the unit tangent bundle T 1 (y (l))- This geodesic 
orbit, which we denote by 7r a ,(, )C ], is compact and depends only on the SL2(Z)-orbit 
of (a, b, c). We obtain in this way a collection of h(d) closed geodesies 

%= U 7m,c] cT 1 (y (i)), 

[a,6,c] 

see Figure 2 for the case d — 377. This collection of compact orbits of the geodesic 
flow then carries a natural probability measure invariant under the geodesic flow 
which wc denote by /id- Let be the Liouville (Haar) probability measure on 
T 1 (io(l)) J then Duke's theorem (as extended by Chelluri [8] to the unit tangent 
bundle) gives the following: 

Theorem 1.3 (Duke). As d — > +oo amongst the positive fundamental discrimi- 
nants, the sefSd becomes equidistributed on the unit tangent bundle T 1 (YJ ] (1)) with 
respect to the measure ■' for any continuous compactly supported function ip on 
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Figure 2. The distribution of G377 projected on the fundamental 
domain of SL 2 (Z)\H, note that h(377) = 1. 

/ <p(t)dfj,d(t) -> / (p(u)dfi L (u). 
J&d Jti(y (i)) 

The equivalence of the equidistribution statement in Theorem 1.2 and Theo- 
rem 1.3 will be explained in §2.4. 

The restriction to fundamental discriminants is not essential; indeed all the 
proofs extend to the general case, including the one we present here. Duke's proof 
is fundamentally different from Linnik's; it does not rely on ergodic theory but 
on harmonic analysis of the modular surface SL 2 (Z)\IHI, that is on the theory of 
automorphic forms supplemented by deep arguments from analytic number theory 
and in particular a breakthrough of Iwaniec [17]. 

In this paper we give a new proof of Duke's theorem in the case of positive 
discriminant. Our proof is strongly influenced by Linnik's ergodic method, and 
may be seen as a modern incarnation of Linnik's original ideas, and we use the 
positivity of the discriminant as a substitute to Linnik's condition that Skubenko 
relied on in his work. 

There are two main ingredients in the proof: 

(1) Linnik's Basic Lemma — An upper bound on the number of nearby pairs 
of points in the projection of Rdisc(^) to Vdi sc ,-i(K) (as this set is infinite, 
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the quantity to be bounded needs some additional interpretation), which 
eventually reduces to an upper bound on the number of ways a given binary 
quadratic form can be represented by a ternary quadratic form. 
(2) The uniqueness of measure of maximal entropy for the flow corresponding 



We have made an effort to present both of these main ingredients in a self-contained 
way, as each relies on some well-known results that are unfortunately well-known 
in essentially disjoint circles of mathematicians. 

The second of these two ingredients replaces a more explicit but less conceptual 
argument of Linnik and Skubenko. The uniqueness of the measure of maximal 
entropy for this action is well-known (both in the cocompact and finite volume 
case) and in the cocompact case dates back to work of R. Bowen [4]. However 
the version we give here is new in that it allows us to control how much weight 
'Sd gives to small neighborhoods of the cusp in SL2(Z)\H: essentially, we give a 
fmitary version of the uniqueness of measure of maximal entropy in the noncompact 
quotient SL2(Z)\SL2(R). This fmitary version is the content of Theorem 4.2, and 
involves a careful analysis of how much entropy can be carried by a t -invariant 
measures that give disproportionately high weight to the cusp. A cleaner version 
of the relationship between entropy and mass in the cusp (although not directly 
applicable for our main purposes) is given in Theorem 5.1. We believe these results 
are of independent interest, and will likely have other applications; it also raises 
some interesting new questions (see e.g. [11]). 

We mention that another modern exposition of Linnik's method in a similar 
context (distribution of integer points on spheres) by J. Ellenberg and two of us 
(Ph.M. and A.V.) has appeared already in [14]. In that work Linnik's Basic Lemma 
is again a central ingredient, complemented by a different argument to convert the 
upper bounds provided by the Basic Lemma to equidistribution (i.e. both upper 
and lower bounds on number of points in specified regions). The reader may wish 
to compare these two complementary approaches. 

1.3. Notation. We collect here some notation that is used throughout the paper: 
The group SL2(M) acts transitively on the upper-half plane model H of the 
hyperbolic plane by fractional linear transformations and the stabilizer of the point 
i is the compact subgroup SOa(M). The resulting identification 



descends to an identification of EI with PSL2(R)/PS02(IR); moreover the action of 
PSL 2 (R) on the unit tangent bundle H is simply transitive. If we let p € T 1 (H) be 
the tangent vector pointing up at i, then g i— > gp gives an identification PSL2(R) ~ 
T 1 !!. Taking the quotient by PSL 2 (Z) we obtain an identification with the unit 
tangent bundle of the modular curve 1 PSL 2 (Z)\PSL 2 (IR) ~ T 1 (PSL 2 (Z)\H). 

We shall make use of another identification of the quotient PSL 2 (Z)\PSL2(M), 
namely with the space of lattices in M 2 up to homothety. Indeed, the space of 
lattices ^(K) is identified with GL2(Z)\GL2(IR) via g Z 2 : .g; the same map also 
identifies the space [/^(R)] of lattices up to homothety with PGL 2 (Z)\PGL2(M) 



1 Actually the modular curve has singularities at the points i and j = - owing to the fact 

that these points have non-trivial stabilizers in PSL2(Z), we will ignore this minor point. 




H~ SL 2 (R)/S0 2 (K) 



DISTRIBUTION OF CLOSED GEODESICS 



7 



and the set C^\R) — X of lattices of covolume one with SL 2 (Z)\SL 2 (M) = 
PSL 2 (Z)\PSL 2 (]R). Finally, the sets [£ 2 (R)] and C^p (R) are also identified via 
the map 

Thus the following spaces are identified: 

X ~ PSL 2 (Z)\PSL 2 (R) ~ T 1 (PSL 2 (Z)\H) ~ [£ 2 (R)] - 

When we speak of "the lattice corresponding to x € X," we have in mind always 
the image of x under the isomorphism X ~ £ 2 (R)- 
We take the following fundamental domain 

y = {( z , v ) e H x S\ \toz\ < 1/2, \z\ > 1} C T 1 (H) ~ PSL 2 (M) 

for PSL 2 (Z) = T. 

Fix an arbitrary left-invariant Riemannian metric d on PSL 2 (M). It descends to 
a metric on X, denoted dx or simply d for short. Explicitly we have 

(1.4) dx(PSL 2 (Z)5i,PSL 2 (Z)92)= min d{g ul g 2 ) 

7GPSL 2 (Z) 

The geodesic curves on T 1 (H) — which in the upper half-plane are circles and 
lines intersecting the real axis in a normal angle — correspond to the orbits of the 
right A-orbits in PSL 2 (R) where A — {a t } is the diagonal subgroup of PSL 2 (R). 
By a slight abuse, we shall use A to refer to the diagonal subgroup of all three 
groups: GL 2 (R),PGL 2 (R) and SL 2 (R). 

Acknowledgements: The authors would like to thank Peter Sarnak for encour- 
agement and many helpful conversations. A.V. would also like to thank Jordan 
Ellcnberg for many discussions on the topic of quadratic forms. The authors also 
thank Menny Aka, Asaf Katz, Ilya Khayutin, Lior Rosenzweig for carefully going 
over a preliminary version of this paper. 

2. Representations by the discriminant, orbits and quadratic fields 

In this section we explain in greater detail the relationship between Skubenko's 
equidistribution theorem and Duke's and connect these questions to the arithmetic 
of real quadratic fields. Along the way we will find a few equivalent ways in which to 
describe compact A-orbits in Building on that we prove in §2.4 the equivalence 
between Skubenko's and Duke's formulations. 

2.1. Overview of the bijections. Recall that we have previously associated to 
any clement of [Rdisc(d)] i.e. to any GL 2 (Z) orbits in Raise (rf) a closed geodesic 
on SL 2 (Z)\H. On the other hand, as discussed in §1.3, a closed geodesic in % 
corresponds to a closed A-orbit on the space X. 

Write &d '■— 1*[ d+ ^ i \ for the order of discriminant d. 

We shall show below that the following sets are in natural bijection to each other: 

i- [Rdisc(d)], the set of GL 2 (Z)-orbits of primitive representations in Rdi sc (d). 

ii. The set of GL 2 (Z)-conjugacy classes of ring embeddings i : <-}■ M 2 (Z) 
which are optimal, i.e. for which the embedding cannot be extended to an 
embedding of a strictly bigger order G > with image in M 2 (Z). 

iii. Cl(^d) = the set of if x -homothety classes of proper ^-ideals. 
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In the case of a fundamental discriminant the above objects and their bijections 
are a bit easier to explain. In fact, if d is a fundamental discriminant, then ev- 
ery representation is primitive, every embedding is optimal, and every ^-ideal is 
proper. In reading the remainder of the section the reader may first specialize to 
this case, or even continue reading with Section 3 and only refer to the portions of 
this section as needed for the remainder of the paper. 

2.2. Discriminant and quadratic fields. We establish the bijections of §2.1. 
Before beginning, we note that the sequence of maps 

(2.1) a X i + b X y + cy^( b a /2 6 { 2 ) h> ( £ ~* 

defines an isometry between the spaces of (real) binary quadratic forms, symmetric 
2x2 real matrices and trace zero 2x2 real matrices, where each of those is equipped 
with a quadratic form: 

(Q(R 2 ),disc) ~ (Sym 2 (R),-4det) ~ (M 2 ° (R) , - det) . 

The action of GL 2 (Z) in (1.2) is the restriction of the following action of GL 2 (M) 
on Q(R 2 ): 

which intertwines with the actions 

t 2 , 2^ 1 ( a 6/2 \ t ( b -2a \ , 

g.(ax +bxy + cy ) ^ — ^ fe/2 ^ j 9 ^ 9 [ 2c _ b ) 9 ■ 

Observe that these actions factor through PGL 2 (M). They also induce an isomor- 
phism between PGL^Z) and the group of orthogonal transformations of (Q(]R 2 ), disc) 
preserving the integral quadratic forms. 

Let d be a discriminant which is not a perfect square; let (a, b, c) € Rdisc(^) be 
a representation, and let 

b -2a 



(2.2) m = m a ,b,c 



2c -b 



be the trace zero matrix associated to it via the map (2.1). Since 

m 2 = d ■ Id 

this defines an embedding of the quadratic field (d is not a square) K - 
into M 2 (Q) 

K i y M 2 (Q) 
m u + vyd M- wld + v.m 

2.2.1. Representations and optimal embedding. The integrality properties of this 
embedding are measured by considering 

G m :=l- 1 (M 2 (Z)) 

which is an order in K. Let us identify which order: Note that @\. m = G m for any 
A e Q x . Hence if 6 2 — 4ac = d for a, 6, c € Z we may write 

(a,b,c) = f(a',b',c') 

with / € Z and a', 6', d £ Z coprime integers satisfying 

disc(a , ,b , ,d) = d' = d/f. 
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This reduces the discussion to the case where (a, 6, c) is a primitive representation 
of d (a representation with coprime entries). 

Assuming that (a, 6, c) is primitive, one sees quickly that 

(2.3) L m \e m ) = d = n±±^] 

is the order of discriminant d. If (2.3) holds, we say that i m defines an optimal 
embedding of G& into M%{fi). We obtain in that way a bijection between 

the set of GLi2(Z)-orbits of primitive representations [Rdisc(^)] 

and 

the set of GL 2 (Z)-conjugacy classes of optimal embeddings i : G d M 2 (Z). 

2.2.2. Embeddings and ideal classes. Let us recall, that a lattice I C if is a proper 
^d-ideal, iff 

Gi := {A e K, X.I Cl} = G d . 

Then there is a bijection between 

the set of GL 2 (Z)-conjugacy classes of optimal embeddings of G d 

and the set of proper ideal classes of G d 

Gl(^d) = the set of i4T x -homothety classes of proper ^-ideals. 

This bijection goes as follows [18]: Given a proper i^-ideal I C K, one may 
choose a Z-basis I = Z.a + Z./3 which gives an identification 

I ^ Z 2 
wa + u/3 H> (u, u) 

This identification induces the embedding 
defined by 

i(A)(u, v) = 9(X.(ua + vf))), 

(or in other terms, such that 8(X.x) — 9(x)l(X)). 

Since G d .I C I, one has L(G d )Z 2 C Z 2 , that is i{G d ) C M 2 (Z) and the fact that 
7 is a proper ^-ideal is equivalent to the fact that i is an optimal embedding of 
G d - If we replace the Z-basis (a,j3) by another basis (a',j3') then i is replaced by 
a GL2(Z)-conjugate. Finally if I is replaced by an ideal in the same class V = X.I 
X e K x , then the corresponding GL 2 (Z)-conjugacy classes coincide: [tj/] = [t/]. 

The inverse of the map 

[7] h+ [ tJ ] 

is as follows: given an optimal embedding t : K M> Af 2 (Q) of Gd, let ei = (1, 0) £ Z 2 
be the first vector of the standard basis 2 of Z 2 , then the map 

e . K ^ Q 2 
A i — ^ ei.t(A) 

is an isomorphism of Q-vector spaces; next define the lattice I = 6^ 1 (Z 2 ) in K 
which is invariant under multiplication by Gd- In other words, I is an ^-ideal and 
I being proper is equivalent to l being optimal. 

2 We could have chosen any primitive vector in Z 2 . 



10 



M. EINSIEDLER, E. LINDENSTRAUSS, PH. MICHEL, AND A. VENKATESH 



2.2.3. The Picard group of the order We now recall the definition and basic 
properties of the Picard group for an order G$ in a quadratic field. 
The product of two ^-ideals / and J gives another ^-ideal 

I ■ J — {XX' : X e I,X' E J}; 

and clearly this operation respects the equivalence relation introduced above on 
^-ideals. An ^-ideal / is invertible if there is some ^-ideal J so that I ■ J = 
An ^-ideal / is locally principal if for any prime p, 

I P ■= I®z Z p = X p (@d) p , 

where (& d ) p — &d ®i Z p and X p is an element of (K ®q Q p ) x - Both properties 
depend only on the ideal class [I] and not on / itself. 

For general orders & in number fields and ^"-ideals /, one has the following 
implications 

I is locally principal =>■ / is invertible =>■ / is proper. 

We shall make use of the following property of orders in quadratic number fields: 

Proposition 2.1. For the orders G d in quadratic number fields the inverse impli- 
cation 

I is proper =>• / is locally principal 

holds for G d -ideals I. In particular, the set of proper ideal classes Cl(^rf), endowed 
with the composition law induced by forming the product of two lattices, has the 
structure of an abelian group. 

This nice special feature of quadratic orders comes from the fact that in the 
quadratic case, orders are always monogenic (i.e. of the form 6 = Z[x]). 

Proof. Recall that 6 d = 1[x] for x = ^ ± ^ k . Assume now that / is a proper i^-ideal 
and consider the 2-dimensional F p -vector space I p /pl p — I /pi. The natural map 

{0 d )plp(0d) P h> End Vp (I p /pIp) 

is injective. To see this, suppose that A € (& d )p acts trivially on I p /pl p . Then 
XIp C pip and ~/ p C I p and so ~ £ p as required. It follows that x the image of x 
in Endw p (Ip/plp) has a minimal polynomial of degree 2 and that I p /pl p is a cyclic 
F p [x]-module. So there exist X p € I p such that I p — X p (ff d ) p + pip which implies 
that 

Ip = Xp{G d ) p + p(X p (0 d )p + pip) = 

= Xp{£f d )p+p 2 I p = X p {ff d ) p +p i I p = ... = X p {ff d ) p . 

□ 

2.3. Interpretation in terms of lattices. Let us verify that the various descrip- 
tions of % are equivalent: 

Given (a, b,c) € R disc (d), put h aib ^ c = ( b + ^ b ^] and w = ( !? }\ G 



2c 2c J V 1 , 

SL 2 (Z). Then wh a ,b,c maps (oo,0) to ■ Therefore, the geodesic 7[ a ,6,c] on 

PSL 2 (Z)\H associated to (a, 6, c) after equation (1.3) is: 

7[o,6,c] = wh atb , c .(0,oo), 
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where (0, oo) is the geodesic on H joining and oo. Now (0, oo) corresponds, in 
the realization T 1 (H), to the A-orbit of the identity in SL 2 (M); therefore 7[ a ,6,c] 
corresponds to SL 2 (Z) • wh a ^^ c A — SL 2 (Z) ■ h a i> c A, or equivalently the lattices of 

the form Z 2 • h a ,b,c<H C £ 2 ( a < e A). Now one calculates 



l\\ 1 f« I 

1 n ) = — I 6 1 



det(/i 0) 6 lC ) ' ' \5 °/ ' ' \5 c / 

which shows that in a particular basis of Z h a b tC the quadratic form (?o(^, y) = £y 
takes the shape as in (2.4). 

Since A is the stabilizer subgroup of qo, we have verified that 7[ 0l & jC ] corresponds 

to: 

The set of homothety classes of lattices L, such that the restriction 
of the quadratic form q Q (x,y) — xy to L, expressed in terms of a 
basis a, /3 of L, take the form 

tr. ,s i n\ T /r ,au 2 + buv + cv 2 
(2.4) q (ua + v(3) = vol(L) . 

Note that the particular quadratic form au + b ^+ cv [ s no t canonically attached 
to the lattice L because of the different choices of a basis. 

Set mo = and t to be the embedding t : K <-t Diag 2 (R) C M 2 (M) 

obtained by mapping \fd to d 1 / 2 mo and #o be the linear embedding 9$ : K <—t R 2 
given by 

O (X) = (1, 1> (A), i.e.. 9 Q (u + vVd) = (u + v\d\ 1/2 ,u - v^ 1 / 2 ). 

Now let us verify, as asserted in §2.1, that the ^4-orbit of 9q(I) belongs to 
for any proper ^-ideal /. (We don't verify the more precise assertion that this 
is exactly the element of % that corresponds to the class of I under the bijection 
Cl(^ d ) <-> Rdisc). We need to verify (according to (2.4)) that Ae/4 ffff^ 
is a quadratic form of discriminant d. But qo(0o{X)) = N^q(A) is the norm; and 
for any ideal I C K that vol(6* (-0) = |c?| 1 / 2 N(7). Here we have defined norm N(I) 
of an ideal (relative to by the ratio of indexes 

(ff d :<? d ni) 



N(/) = 



(/:^H I) 



Now, for any ideal /, the map x G J i-> — jqfffy-^ is easily verified to be an integer 
quadratic form of discriminant d, as desired. 

2.4. A duality principle. Our goal now is to show that the equidistribution state- 
ments of Skubenko's theorem and of Duke's theorem are equivalent. 

The discussion which follows is valid in great generality; but we will consider 
only G = PGL 2 (R), T = PGL 2 (Z), and the diagonal torus A in G. 

Since PGL 2 (M) is identified with SOdi sc (K), it acts transitively on Vdi sc ,+i(K) (by 
Witt's theorem) and equals the PGL 2 (M)-orbit of (say) qo(x,y) = xy; equivalently 
Vdisc.+i(K) is identified with the PGL 2 (IR)-conjugacy class of the matrix too which 
has A as its stabilizer subgroup in G. Hence 

V disc , +1 (R) = PGL 2 (R).g„ * PGL 2 (M).to ~ PGL 2 (R)/A 
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2.4.1. Duality between orbits. It follows from the previous discussion that each rep- 
resentation (a, b, c) € Rdisc(^) is identified with some class g a ,b,cA/A € G/A or what 
is the same to an orbit g a .b, c A C G for some g a ,b,c <= G such that 

g a ,b,c-qo = |rf|~ 1/2 (a, b,c), q = (0, 1,0). 

As we have seen T acts on Rdisc(rf) and the latter decomposes into a finite disjoint 
union of T-orbits, setting 

[a,b,c] =r\T(a,b,c) e [R d isc(d)], 

for the orbit of (a, b, c), one has 

R diS c(rf)= [J r.(a,b,c) 

[a,6,c]e[R disc (d)] 

Hence |d| _1 / 2 .Rdi S c(cO is identified with the collection of T-orbits 

□ Tg aAc A/A c G/A- 

[o,6,c]6[R disc (d)] 

thus the problem of the distribution of |d|~ 1//2 .Rdi S c(cO inside Vdi S c,+i(K) is a prob- 
lem about the distribution of a collection of T-orbits inside the quotient space G/A. 

There is an almost tautological equivalence between (left) T-orbits on G/A and 
(right) ^4-orbits on T\G given by 

(2.5) TgA/A <— > TgA <— > T\TgA. 

This duality induces a close relationship between the study of the distribution of 
|d| _1 / 2 .Rdi sc (d) inside Vdisc,+i(K) and the distribution of the collection of right- A 
orbits 

% = (J x [aAc] A c T\G 

[a,6,c]e[R disc (d)] 

inside the homogeneous space T\G, with 

(2-6) a; [0i6)C] = T\Tg aAc . 

This is the "duality principle" alluded to at the beginning of this section. Let us 
make this principle a bit more precise by identifying the orbits in question: 
Assuming that (a, 6, c) G Rdi sc (cO is primitive; one has 

X[aM,c]A = T\Tg atbtC A = T\TA a ^ byC g afitC 

where 

A a ,b,c = 9a,b, c Hg~l c = stab (0)( , iC) (G) 

is the stabilizer of (a, b, c) in G. That group is the group of real points of a Q- 
algebraic group, which we will denote by T 0j ;, iC , namely the image in PGL 2 of the 
centralizer Z m of 

b 2c \ 
-2a -b ) ' 

In terms of the embedding i = i mcl h c : K ^ M 2 (Q), one has 

Z m (Q) = L(K x ), 

and 

T(Q) = l(K x )/Q x U, A aAc = T aAc (R) = i(K ® M) x /K x Id, 
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and (since M 2 (Z) n i{K) = t(^ d )), 

r aAc : = r n A aAc = t (^ x )/{±id}. 

Alternatively, let t denote the (real) embedding 

X i ^ M 2 (R) 
L ° u + v\/d i— > uld + v.d l / 2 mo 

obtained by conjugating t m with g~\ , we have 

to (If ®q M) x /M x Id = ^ 

and 

^,6,0 == //„ ',,! '/A,/,- nA = i0 (^ x )/{±Id} 
so that we have homeomorphisms 

(2.7) x [a<b!c] A = r\g a , btC A~g-l c rg atbjC nA\A = L (K ® R) x /R x io(0£). 

By Dirichlet's unit theorem, to(-f ® K) x /K x io(^d ) i s compact hence a;r a & )C i.A is 
compact and since [Rdisc(^)] is finite we obtain: 

Theorem 2.2. The union of A- orbits % is compact. 

2.4.2. Duality between measures. To consider equidistribution problems, one needs 
to refine the correspondence (2.5) at the level of measures. Roughly speaking, the 
choice of the counting measure pr on T and of left-invariant Haar measure p A on 3 
A define a measure theoretic version of the correspondence (2.5): 

Fact. There exists homeomorphisms between the following spaces of Radon mea- 
sures (relative to the weak-* topology): 

left T -invariant left T , right A-invariant right A-invariant 

(2.8) Radon measures < — > Radon measures < — > Radon measures 

A on G I A p on G v on T\G. 

These homeomorphisms are characterized by the identities: for any if £ ^ C (G), one 
has 

\(<p A ) = p(<p) = v{if T ) 

where 

VA(g) ■= / (p(gh)dfj, A (h), <MsO = Y V^-d)- 
Ja 7 er 

See for instance [2, §8.1] for a proof of that fact. We work out this correspondence 
in specific cases: 

— p is a Haar measure pa on G, which is G-biinvariant as G is unimodular. The 
correspondence (2.8) yield the quotient measures v = pr\G 011 T\G, and A = 
fJ-G/A ^ Mdisc,±i on G I A. The former measure v is finite (i.e. T is a lattice in G) 
and we may adjust pa s ° that pr\a is a probability measure. 

— The sum Ad of Dirac measures on G/A given by 

X d= Y S g a . b ,aA/A = Y Y S sA/A 

(a,b,c)eR d i S c(d) [a,b,c] ser. Sa>6jC 

= Y Y S 19a, b ,cA/A- 

[a,b.c] 7er/r Q ,i,, c 

^Note that A is unimodular. 
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Proposition. The measure V& on T\G corresponding to Xd under (2.8) is the 
sum of the push forwards of the Haar measure fiA over the set of A-orbits 
X[ a ,b,c]A [a,b,c] e [R disc (d)]. 

Indeed, set A[ aj ;, )C ] = X^gr/r b $ lga b a A/A- Then if S denotes a fundamental 
domain in A for T abc 

\a,b,c](<PA) = ^2 / i P{l9a.b, c h)dh = ^ j tp(^g a , b , c h)dh 
fr{g a ,b,ch)dh = ip r (h)dh, 

hence the measure on T\G corresponding to A[ aj b jC i is given by the push forwards of 
the Haar measure [Ia to the periodic A-orbit x^^^A, and the proposition follows. 
Let 

vol(%) := v A (fS d ) = ™l(x [aibtC] A), 
[a,6,c] 

denote the total volume of this (finite) collection of (compact) A-orbits. From (2.7) 
we see that the various orbits associated to primitive representations of d have the 
same volume, namely with the correct normalization of the Haar measure of A 

vo\{x [aAc] A) = vol(M x t (OV4) = Reg(^ d ) 

where Keg(^d) is the regulator of &d- Therefore, 

vol(%) = |Pic(^)|Reg(^). 

If d = disc(i^^-) is a fundamental discriminant, the Dirichlet class number formula 
gives 

vol(%) = |Pic(^)|Reg(^) = A|dr/ 2 i((-),l) 

where A is some absolute constant, (-) is the Kronecker symbol and L((-),s) its 
associated L-function. Then by Siegel's theorem L((-),l) = as d —> oo so 

that 

(2.9) vol(%) = \d\ 1/2+o{1) . 

If d = d' f 2 with d' a fundamental discriminant 
|Pic(^)|Reg(^) 



'5(-o) 



\Vm(0d>)\Keg{ff d - , 

which shows again that |Pic(^ , d )|Reg(i^ (i ) = \d\ 1/2+o( -^ and hence (2.9) holds in 
general (c.f. e.g. [10, Sect. 9.6]). We let 

1 

vol(%) 

This is an ^-invariant probability measure on T\G and the above discussion shows 
that Skubenko's Theorem on page 3 follows from the following: 
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Theorem 2.3. As d — > oo amongst the non-square discriminants, the sequence of 
measures fid weak-* converge to the probability measure fJ-^XGi *- e - f or an V G 
^ c (r\G), one has 

Mir) = ,L x I <pr(ti)dh^> Hr\c(<Pr)- 

Indeed any continuous compactly supported function on G / A is of the form ipA 
for (p £ ff c (G), hence by Theorem 2.3 

^<i(<Pa) = ^d(^r) = vo\(^ d )fi d (ip r ) 

= vol(%)( M r\ G (</?r) + o(l)) - vo1(%)( Mg /a(^a) + o(l)). 

3. Spacing properties of torus orbits 

In this section, we show that the various distinct orbits xi ai b lC ]A C % are in 
a suitable sense well spaced from each other; the main result is Proposition 3.6. 
Recall that 

% = |J X [aAc] A - 

[o,6,c]e[R(<Q] 

where aJ[a,6,c] is defined in (2.6). 

3.1. Ideal classes are controlling the time spent near the cusp. The space 
X is not compact and this is measured through a height function (normalized to 
be invariant under scaling): given, for L = Z 2 .jCl 2 a lattice, by 

, m _ / min^g^io} _ / min x&Z 2_ {0} ||xg|| \-i 

1 ' \ vol(L)i/2 ) \ | det(ff)|V2 ) ■ 

where ||.|| denote the Euclidean norm. This continuous function is proper. Indeed, 
if x £ X and (z, v) £ ,!? any representative, then the height \&{x) and the imaginary 
part satisfy ^s(z) — ht(x) 2 . For any H > 1 let denote the set of all x € X 
with ht(ar) > if. 

In this section we evaluate explicitly how big the height of a lattice in Sf d could 

be. 

Proposition 3.1. Suppose the proper integral ideal J C &d corresponds to [a, b, c] £ 
Rdisc(d) under the bisection of §2.1. Then X[ ai t, iC ]Ar\X>H is nonempty if and only if 
J -1 is equivalent to an ideal I C of norm < ^H~ 2 d 1 / 2 . Moreover, this defines 
a bijection between connected component % n X>h o,nd proper &d~ideal I C &d of 
norm < \H- 2 d 1/2 . 

Even though the above does not control escape of mass for /id as d — > oo it does 
give an upper bound for fid(X>H), see Proposition 3.3, which we will use in our 
proof of Duke's theorem. Note that Proposition 2.1 guarantees that there is an 
inverse J -1 to the proper ideal J. 

Remark 3.2. Applying this result to H = d 1 ^ 4 we see that % n X >d i/4 is empty 
(as there are no ideals of norm < 1). This implies that % is pre-compact. 

Proof. Note that, if we identify x £ X with a lattice L of covolume 1, then xAC\X>h 
is nonempty if and only if there is some nonzero vector (u, v) £ L with \uv\ < }>H~ 2 . 
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Therefore (using the explicit bijection of §2.1) the A-orbit defined by J intersects 
X>h, if and only if J contains an element A with 

|N(A)|<^- 2 N(J)di 

Recall that N(J _1 ) = N(J) _1 by standard properties of the norm. It follows that 
the A-orbit defined by J intersects X>h if and only if N(AJ _1 ) < \H~ 2 d^ for 
some A G J (so that AJ _1 C 

Finally, notice that for H > 1 there is, in a lattice V g X>h, up to sign, 
only one primitive nonzero vector of length < H~ 1 vo\(L') 1 ^ 2 (which is a simple 
volume computation). Therefore, fixing J, in the above argument, a connected 
component of Oq(J).A n X>h corresponds to a unique primitive element A € J 
with |N(A)| < iff- 2 N(J)d2 (up to sign) and we can associate to this connected 
component the ideal I = AJ _1 C 0d of norm < \H~ 2 dh . □ 

Proposition 3.3. There is "not too much mass high in the cusp" in the sense that 

Hd{X> H ) « e d s H~ 2 

for all e > and H > 1. 

Note that to make this estimate useful, we will set later H = d 6 for some e > 0. 

Proof. We note first that in any orbit in % the maximal height achieved is < di (see 
Remark 3.2). This implies that for H > 1 any connected component of % n X>h 
has length <C log(eQ. Indeed such a component corresponds (in the upper- half plane 
model) to the segment of some oriented geodesic circle (i.e. a half circle centered 
on the real line) made of whose points which have imaginary part between H and 
d 1 / 4 : the hyperbolic length of such a segment is bounded by <§C log(di /H). 
Therefore, by Proposition 3.1 

vol(% n X> H ) « \og{d)N< H (d) 

where N<jj(d) is the number of proper ideals I C &d of norm N(J) < ^H~ 2 di. 
Recall that for any n G N the number of proper ideals in &d of norm equal to n 
is bounded by the number of divisors of n and so by <^ E n e . By summing over all 
1 < n < \H- 2 d^ we get that N< H (d) < e (H- 2 d^) 1+£ . Together with (2.9) this 
proves the proposition. □ 

3.2. Linnik's basic lemma and representing binary quadratic forms by 
ternary forms. Following Linnik we will derive the "basic lemma" from represen- 
tation numbers of quadratic forms: Let q, Q be two integral non-degenerate qua- 
dratic forms on Z m and Z n respectively. Assuming that m < n, a representation 
of q by Q is an isometric embedding of quadratic lattices 

l : (Z"\<z) =-> (Z n ,Q) 

in other terms a Z-linear map l : Z m — > Z" such that for x € Z m 

Q(i(x)) = <z(x). 

For instance a representation x £ Z" of an integer d € Z by a quadratic form Q on 
Z™ may be viewed as the isometric embedding 

{Z,dx 2 ) ^ (Z n ,Q) 
x n — > nx 
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Let Rq(o) be the set of such representations: The group T = SOq(Z) acts on Rq(<?) 
(for 7 £ r, 7.4 = 701) and the quotient r\Rg(fl) is finite. 

We are interested here in evaluating |T\Rq ((7) | in the codimension one case (i.e.. 
when 71 — in — 1) . More precisely, we will need to show that, in this case, |r\Rg(g)| 
is rather small. The simplest evidence come from the case m = 1, n — 2 : the 
representations of an integer by a binary quadratic form. For instance it is well 
know that for d =/= the number of integral solutions to xy = d (i.e. the number 
of divisors of d) is bounded by O e (d E ). Similarly the number of representations of 
an integer as a sum of two squares satisfies the same bound; indeed, for any binary 
integral quadratic form Q one has |r\Rg(d)| <C g \d\ e for any e > 0. The following 
is a version of this claim for m = 2, n = 3, where in the case of non-fundamental 
discriminants the estimate is not as strong. 

Proposition 3.4. Let Q be an integral ternary quadratic form, and let 

q(x, y) = ax 2 + bxy + cy 2 

an integral binary quadratic form, both non-degenerate. Assume that f 2 \ gcd(a, b, c) 
is the greatest common square divisor of a, 0, c. Then the number N of embeddings 
o/(Z 2 ,g) into (1i 3 ,Q), modulo the action o/SOq(Z), is <CQ,e /max(|a|, |c|) e . 

When Q = x 2 + y 2 + z 2 is the "sum of three squares" quadratic form such a 
bound is a consequence of an explicit formula on the number of representations due 
to Venkov [26] (assuming a square- free) . This bound was later generalized by Pall 
[21, Thm. 5]. We provide a self-contained treatment in Appendix A. 

Let 

((a, b, c), (a', 6', c')) disc = disc(a + a', b + b' , c + c') — disc(a, b, c) — disc(a', 6', d) 
= 2bb' - 4ac' - 4a' c 

be the polarization inner product associated with the quadratic form disc. We will 
apply Proposition 3.4 to the pair 

Q = disc, q(x, y) = dx 2 + ixy + dy 2 , 

and note that q(x,y) is non-degenerate if an only if i 7^ ±2d. Hence we obtain: 

Corollary 3.5. Let T = SOdisc(^)- Then for any two integers d,l with £ ^ ±2d, 
the number of T-orbits on pairs 

{[[a,b,c),(a',b',c')) £ Z 3 x Z 3 : 

disc(a, b, c) = disc(a', 6', c') = d, ((a, b, c), (a', 6', c')) d i SC = i) 

is <C e /(max(|d|, |^|)) E , where f 2 is the largest square factor o/gcd(d, £). 

We now translate the information obtained about quadratic forms above to Lin- 
nik's basic lemma, which we phrase in the geometric context. This falls short from 
equidistribution but will suffice as the arithmetic input to the ergodic arguments 
later. 

Proposition 3.6 (Basic lemma). We have 

ii d x fi d {(x,y) £ X\ H : d x (x,y) < 6} < e H 4 S 3 d e 
whenever d~i < S < \H 2 and e > 0. 
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Note that the exponent 3 of S 3 is optimal, and suggests that fid is 3-dimensional 
in the appropriate scale. The trivial exponent is 1, which follows from A-invariance 

of fl d . 

Proof. We start by indicating the relationship between £-close tuples in (%HX<# ) 2 
and the representation of the binary quadratic form q(x, y) — dx 2 + ixy + dy 2 by 
the ternary quadratic form disc. 

From (1.4), 31,32 € PSL 2 (R) are such that x t = Yg l £ % n X< H for i = 1,2 and 
dx(xi,x 2 ) < 5, then we may assume 

(3.1) 3i ey, 5 2 £S", r 9l eX< H and d( gi ,g 2 )<5, 

where ,5^" is some slightly bigger set containing the fundamental domain ,5^ in its 
interior. For concreteness we take 

y' = {( z ,v) eixS 1 , \Rz\ < 1, 3z > 1/2}. 

This clearly shows that the matrix entries of both 3^ are controlled, i.e. ||3j| <C H 
where 

\\g\\=tv{g t g) 1 / 2 . 

Moreover, we may associate to g^ the primitive integral quadratic form, 

qi{x,y) = Vd\gi.q ](x,y) = OiX 2 + b t xy + c t y 2 , b 2 - Aaia = d, gcd(a i7 b {l a) = 1. 

We have to consider two different possible cases. Either qi — q 2 (i.e. 32 G g%A) 
or qi ^ q 2 . 

The total mass for the first case is easy to estimate by <C e rf 1 / 2+e 5 before nor- 
malization by the total volume, which gives after the normalization that 

fi d x n d {(Tg x ,Tgxh) € X% H : h e A,d(ld,h) < 6} <C £ 6d~ 1/2 d e < 5 3 d £ 

since d -1 / 4 < S. 

Henceforth we assume qi 7^ q 2 . Since ||<7j|| <C H, we have 

(3.2) max(|a l |,|6 4 |,|c l |)«d 1 / 2 J ff 2 . 

Also by assumption 32 = g\h with d(h,ld) < 5. This shows that q 2 — Vdgi.(h.qo) 
where \\h.qQ ~ qo\\ <C 5. Therefore, 

(3.3) max(|ai - a 2 |, | fox - H \d - c 2 |) « d 1 ' 2 H 2 5. 
We now define 

q(u, v) = disc(u(ai, b%, Ci) + v(a 2 , b 2 , c 2 )) = du 2 + luv + dv 2 . 

From the bound (3.3) on the difference of the vectors we know 

|g(l,-l)| = \2d-£\ <^dH 4 S 2 . 

In order to apply Corollary 3.5 on q, we need to check that q is not degenerate, 
i.e. that £ ^ ±2d. Indeed, if I = ±2d then 

d(a 2 Tai) 2 = q{a 2 ,-ai) = disc(a 2 (ai, &i, c%) ~ ai(a 2l b 2l c 2 )) = (a 2 bi - ai6 2 ) 2 , 

which contradicts the assumption that d is not a perfect square. Therefore £ ^ ±2d 
In this case we may apply Corollary 3.5 to obtain the bound 

Nt td = \SO disc {Z)\{(Z 2 ,dx 2 + £xy + dy 2 ) =-> (Z 3 ,disc)}| < fm&x(d,£) £ 

on the number 7V^ d of inequivalent ways in which the quadratic form dx 2 +£xy+dy 2 
can be represented, where / 2 | gcd(d, £) is the greatest square divisor. Note that the 
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group SOdisc is rationally equivalent to PGL2, and so up to isogeny rationally 
equivalent to SL 2 . Therefore, SOdisc (2) is commensurable to the image of T = 
SL^Z) and we may also use Y instead of SOdisc (Z) in the above estimate. 
Let 

be a complete list of diagonal T-orbits of pairs of quadratic forms which can be 
written as 

Qi fay) = Vdg^ .q fay) 

with gi\g^ satisfying (3.1) 

The number k of these diagonal T-orbits of quadratic forms is bounded by 

2d+L 

k< N w = Yl Y! Ne ' d 

l=2d-L p\d \2d-l\<L 

f 2 \e, e^±2d 

«,E E' f^«,J2f d —^«^ dl+2£62H4 - 

f 2 \d \2d-i\<L P\d 

f 2 \e, e^±2d 

where L -C dH 4 S 2 and denotes a sum over i for which ^jP- is square-free. 
We claim that for 7^ we have 

(3.4) d(g[ l) a u g { 2 ' ) A)»d- 1 . 

Indeed suppose d(g^at,g2 Ot>) — cd l (f° r some constant c determined in a mo- 
ment). Then we may find some 7 £ Y with jg^a^ £ .y, which also implies 
7<72 a v ^ ^" ■ Remark 3.2 we have % C X<h> for H' = d 1 / 4 . Hence by choos- 
ing c appropriately the upper bound in (3.3) (applied for H' — d 1 / 4 and S = cd^ 1 ) 
is less than one, which gives a contradiction. 

Writing 52 = Si expi> for some v = v~ + v + +v a £ 5(2 (R), with v~ , u + , eigen- 
vectors of Ad at with eigenvalues e -t ,e',l respectively, the estimate (3.4) implies 
that both ||i>~||, \\v + \\ 3> dr 1 . It follows that for any j the inequality 

(3.5) d(g{ j) a t ,gi j) A)<l 

can hold only for t in some interval Ij of length -C log d. 

Claim: For each pair (g[ , ffj ) there is an interval 7j C K of length <C e rf 5 with 
the following property: 

If (21,2:2) € (%nl<fl) 2 with d(a;i,X2) < S have representatives (31,32) satisfy- 
ing (3.1) for which the associated forms q$ — \fdgi.q are different, then 27 = Yg[ a t 
for some j and some £ £ Ij. 

Indeed, (7.91,7.92) = (<7i >?2 ) f° r some 7 £ T and some j £ [1, k] and so 31 = 
7 _1 3i a< resp. 32 £ 7 _1 <72 A By assumption on (71, 32 we have d(g[^a t , g% A) < 5. 

Using the claim and a fixed Haar measure of A (i.e. before normalization) we get 
that the measure of the collection of points (xi,x 2 ) £ (% H X< H ) 2 , which can be 
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represented as Xi = Tgi with gi as in (3.1) and for which the associated quadratic 
forms are different, is 

k 

< J2 l J il 5 <e ***** <e d 1+2e H A 5*. 

3 = 1 

Therefore, by dividing the above by the total volume of (%) 2 , the claim (together 
with the analysis of the case q\ = q 2 ) implies the proposition. □ 

4. An ergodic theoretic proof of Duke's theorem 

4.1. Entropy and the unique measure of maximal entropy. A basic under- 
lying concept in our proof is that of entropy. We recall that if is a partition of 
the probability space (X, v), the entropy of 8? is defined as 



-i—l 



It is clear that H v {9>) = H lJ (T^ 1 ^ > ) if T : X -> X preserves v — below we will 
use this fact without explicit reference. We note for future reference that entropy 
is controlled by an L 2 -norm 



(4.1) 



as one easily sees from convexity of the logarithm map. Moreover, entropy has the 
following basic subadditivity property: if ^2 are two partitions, then 

(4.2) Wv^j) < H^) + H v (&> 2 ), 

where V denotes common refinement. 

If T is a measure-preserving transformation of {X, v) , then the measure theoretic 
entropy of T is defined as: 

(4.3) hjT)=swp lim — ^ '- 

op n— >oo TL 

where the supremum is taken over all finite partitions of X. We also note that the 
limit in the definition exists and is equal to the infimum because the sequence 

a„ = H v {d» vr^v-v r-(n-i)^) 

is subadditive (i.e. a n+m < a n + a m ). 

A key role in our argument is played by the fact that the uniform measure on 
r\SL2(M) for any lattice T can be distinguished using entropy, as it is the unique 
measure of maximal entropy: 

Theorem 4.1. Let X = r\SL 2 (]R) be a quotient by a lattice T < SL 2 (IR), and let 
T denote the time-one-map of the geodesic flow, i.e. right translation 

V/ 2 

e" 1 / 2 , 



T(x) 



Then for any invariant measure v the entropy satisfies h v (T) < 1 where equality 
holds if and only if v = fix is the SL 2 (R) -invariant probability measure on X. 
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The inequality h u (T) < 1 is not hard and can be proved in many ways. Identi- 
fying the uniform measure as the unique measure where this maximum is attained 
is somewhat more delicate. We give a self-contained treatment in Appendix B. 



4.2. Proof of Duke's theorem, an outline. Let T : X — > X denote the time- 
one-map of the geodesic flow as in Theorem 4.1. Recall that 

U- = | (I ^ : t € R j resp. 

^ - {(! 

are the stable, resp. unstable horocycle subgroups. The orbits of these two sub- 
groups give the foliation into stable and unstable manifolds in the following sense. 
If u = u(t) G U~ , then the distance between T n {x) and T n {xu) converges rapidly 
to zero: 



d(T n (x),T n (xu)) = d(x^ 2 e ° n/ ^xu^ e _° /2 ) 



< d 



1 0\ fe- n ' 2 \ /e"/ 2 



e n ' 2 j U \ e""/ 2 

= d 



1 0\ (1 e~ n t 
l)>\0 1 

To give an outline of our argument, it is perhaps preferable to simplify the 
situation. In our proof, the noncompact nature of our space X is a significant 
complication, so instead of considering the quotient SL2(M)\SL2(M) for the purposes 
of this outline let us consider a compact quotient X = r\SL2(M) on which we have 
a sequence of T-invariant probability measures /i^ satisfying the following simplified 
version of the conclusion of Corollary 3.6 

(4.4) n d x fi d {(x,y)eX 2 :d jt (x,y)<S}^: E S 3 d e for 6 > cT 1 / 4 . 

Let r > be an injectivity radius of X so that for any x € X the map B^{e) — > X 
sending g to xg is injective (with G = SL-2(M), and denoting a ball of radius 
r in G). Also assume rj < is small enough so that B^(e) is an injective image 
under the exponential map of a neighborhood of in the Lie algebra. 

Let J 21 be a finite measurable partition all of whose elements have "diameter 
smaller than rf\ i.e. if x and y = xg with g £ B^ belong to the same element 
of then g E B^. Assume that the same holds as well for T l (x) and T l (y) 
for i = -N, . . . , 0, 1, . . . , N. Then d(T(x),T(y)) < 77 and d(e, a^ga) < r so that 
aT 1 ga € B^(e). Repeating this implies that 

N f XII \~ n / !/ 2 \™ 

9^B N = f| f e -i/ 2 ^ G (e) 6 e _ 1/2 . 

n=-iV V / V / 

We define a Bowen TV-ball to be the translate xBm for some x £ X. 

Notice that the set B^ is "tube-like" : it has width at most e~ N rj along the stable 
and unstable directions, but is of length -q in the direction A of the geodesic flow. 
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The above shows that every element of the partition 

JV 

(4.5) g»[-N,N] = y T -n^> 

n=-N 

is contained in a single Bowen TV-ball. Together we conclude that 

k 

|J SxS c\J{(x, ya l ):d(x 7 y)<re- N } 

£g r £j2[-JV,iV] 1=1 

where k <C e N and ai, . . . , a% 6 B^(l) are chosen to be 5-dense - that is to say, the 
union of the S- neighbour hoods around et^ cover B^(l). 
Together with (4.4) this shows that 

£ MS) 2 «e e- 2N tf 

whenever 8 = rje~ N > or equivalently N < i \ogd + logr. We choose N — 
[| logdj (the "extra space" will be useful in supressing a, d £ ). Using (4.1) we have 

H^{&> [ - N ' N] ) > (2-6e)N 

for large enough d. 

In this statement we cannot yet let d — > oo to get a statement about a weak* 
limit fi, because N is a function of d, and so the size of <^\~ N ^ N \ increases with 
d. Thus let N > 1 be any fixed integer: [-N, N] can be covered by many 
translates of [— Nq,Nq\. This in turn shows that 3P\~ N > N \ can be obtained as a 
refinement of the \^-\ partitions 

gg[-N,-N+2N ] gg[-N+2N -N+4N ] 

(in the obvious generalization of the notation (4.5)). By subadditivity (4.2) (and 
invariance) this implies 

H^ d (M- N ^ No] ) > {2-7e)N 

for large enough d. By choosing the original partition 3? such that fi(dS) = for 
all S £ and some weak* limit \x of the sequence [id we can now take the limit as 
d — >• oo to obtain 

H^ [ ~ N ^ No] ) > (2 - 7e)N for all s > and N > i, 

i.e. that h^{T) > 1. Theorem 4.1 can now be invoked to show that [i must be the 
SL2 (M)-invariant measure on X. 

We remark that the analysis above works only in the cocompact case; for e.g. 
r = SL 2 (Z), there is no global injectivity radius; and no matter how fine one takes 
the partition 5 s , to cover a single atom of the partition <^>\-- N > N ] one typically needs 
exponentially many Bowen iV-balls. 

4.3. Proof of Duke's theorem, controlling the time spent near the cusp. 

Passing from the cocompact to the nonuniform case raises two difficulties: 

(i) Why is such a weak* limit a probability measure (indeed, why can't such 
a sequence of measures fid converge to the zero measure)? 
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(ii) The proof outline presented in §4.2 used heavily the relation between Bowen 
TV-balls and atoms of the partition 0p\- N ' N \ for a finite partition t&>. How 
can we adapt this argument to the nonuniform situation where in general 
many Bowen iV-balls are needed to cover a partition element S G J2>[-JV,.zv]<? 

It turns out that these two difficulties are not unrelated, and to handle them 
one needs to control the time an orbit spends in the neighborhood of the cusp, so 
that this problem is related to controlling the escape of mass. What is needed is 
the following finitary version of the uniqueness of measure of maximal entropy: 

Theorem 4.2. Suppose fa is a sequence of A-invariant measures on X, and sup- 
pose there is a a constant r > and a sequence Si — > such that for all sufficiently 
small e > the "heights" Hi ~ <5~ c satisfy 

(1) fa(X> Hi ) -> 0, as i -> oo; 

(2) fa x fa({(x,y) G X< Ht x X<„ t : d(x,y) < S l } < £ Sf~ 5e . 
Then fa — > fix, the SL 2 (R) -invariant measure on X , as i — > oo. 

Clearly, this, Proposition 3.3, and Proposition 3.6 with 6 = d~ z are sufficient to 
prove Duke's theorem. Apart from the ideas already discussed in the last section, 
the main additional step is: 

Proposition 4.3. Fix a height M > 1. Let N > 1 and consider a subset V C 
[-N,N]. Then the set 

Z(V) = | a; G T N X <M n T~ N X <M : for all n G [-N,N] we have 

T n (x) G X> M «neF} 
can be covered by -Cm e 2N ~h\ v \ Bowen N-balls. Moreover, Z(V) is nonempty for 

2 log log M iy 

only « M e 1o s m different sets V C [-N,N]. 

In words, Z(V) is the set of points x € X so that their trajectory T~ N x, T~ N+1 x, 
. . . , T N x between times — N and N begins and ends below height M and are above 
height M precisely at the time specified by the set V. So the content of the 
Proposition is that orbits that spend a lot of time in a neighborhood of the cusp 
in fact can be covered by relatively few tube-like sets. Later we will turn this into 
the statement that those orbits have relatively little mass. 

Note that as the size of V grows the number of Bowen iV-balls needed to cover 
Z(V) decreases, though even if V = [-N — 1, N+ 1] it is still exponential — indeed 
x e N , which is essentially the square root of the estimate we get for V = . 

We defer the proof of the Proposition 4.3 to the next section. A purely ergodic 
theoretic formulation of this phenomena is that a lot of mass near the cusp for an 
invariant probability measure results in a significantly smaller entropy for the geo- 
desic flow. We will give such a formulation in Theorem 5.1; it implies in particular 
that: 

Given a sequence T-invariant probability measures fa with en- 
tropies /i Mi (T) > c, any weak weak* limit /i satisfies > 2c — 1. 

We will discuss in Remark 5.2 why c = 1/2 is the critical point for this phenom- 
enon. 
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4.4. Controlling escape of mass, and maximal entropy. We proceed to the 
proof of Theorem 4.2, and start by showing that mass cannot escape, using assump- 
tion (2). We will use (1) of that theorem which gives a mild control on how fast 
mass could possibly escape to be able to apply the covering argument in Proposition 
4.3. That (2) can replace entropy in that argument is not surprising since we have 
already seen in Section 4.2 a relationship between this assumption and entropy. 

Lemma 4.4. Let /ij be a sequence of T -invariant measures as in Theorem 4-2. Let 
fi be a weak* limit of any subsequence of \ii . Then 

2 log log M 
^ X < M) > 1 - logM 

for every sufficiently large M , and so [i is a probability measure. 

Proof. Fix some k > 21 °^ M ■ We will show that n{X <M ) > 1 - K. 

We set Ni = \— log Si] and Hi = £~ 6 for some e > determined below (more 
precisely: before the final displayed equation of this proof) in terms of n. Notice 
that a geodesic trajectory of a point x € X<Hi will visit X < m in less than 2 log Hi — 
21ogM < 2eNi steps either in the future or in the past. Hence 

L2eJV 4 J 

I^J T n X <M D X< H . 

n=-|2eiViJ 

and so this union contains most of the /i^-mass according to the assumption (1) of 
Theorem 4.2. 

Let Nl = Ni + [2eNi\ . Then T N ^X< Hi n T~ N '*X< Hi is contained in the union of 
< (e^) 2 many sets of the form T N 'i +n -X <M n T- N '* +n +X <M where |n_|, < 
2eNi. We apply this to the set 

N * 

consisting of points that spend an unexpected high portion of [—N^,Nl] above M. 
We wish to estimate fii(X K ). X K is also a union of sets of the form 

z' = x K n T N 'i +n -x <M n T- N '* +n +x <M 

with n_,n + as before. It suffices to estimate Hi(Z') for some fixed n_,?i + . Re- 
placing Z' by an appropriate shift Z := T k Z' we may consider instead Z C 
T N X <M n T~ N X <M where N G [Ni,Ni + 4dVi]. Adjusting the condition on 
the "average time spent above M" appropriately, 

r 1 N 

Zclxe T N X <M n T- N X <M : ^— ]T l x> _ M (T n x) > k - 0(e) 

^ n=-N 

To the right-hand set we apply Proposition 4.3; which shows that Z is covered by 

£ <M e 2 '°5z7r M N e 2N-( K -0(e))N < e 2N i + 2 N t -KN,+0(,e)N t 

many Bowen iV-balls. Because N > Ni, we may also cover Z by I many Bowen 
JVi-balls Si,..., S t . 
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Since Bowen iVj-balls have thickness < e Ni < Si along stable and unstable 
horocycle directions and thickness <C 1 along A, we get that 

l k 

|J Sj x S 3 C \J{(x,yaj) : d(x,y) < S t } 
3=1 3=1 

where k <C e Ni and a,j £ Bf are ^-dense. This remains true if we make the sets 
Sj disjoint by replacing 5 2 by S' 2 — S2 \ Si, S3 by S' 3 = S3 \ (Si U S%), .... By 
our assumption (2) we now get 

£>(<S$) 2 « e St 5e k « e - 2N > +5 * N -. 
3=1 

Therefore, by Cauchy-Schwarz 

l*{Z) < J>(S$) < (j^^S'A 1 ' 2 ^ « e<M e W^-i^W« 
i=i \j=i ' 

Going through all possibilities for n_ , n + (of which there are <C e eNi many) this 
implies 

Mi (X K )« £)M e(W-l^))^. 

Given that we assume k > 2 1 °p g 1 ^ M we can choose e > small enough such that 
the exponent in the above expression is negative so that the measure goes to zero 
for i — > 00 (since Ni — > 00). By definition of X K we have 

r f 1 N ' 1 

H l {X> M )= I lx>M d A*i= / 2N , T l lx >M ^ K + fJ,i{X K )+2fXi(X> Hi ), 

which when i — > 00 implies that ^jl(X < m) >1 — k for any k > 2 ^1 J/ ■ This gives 
the lemma. □ 

We indicated in Section 4.2 how the elements of the refinement \/ n= _fjT^ n ^ 1 
are related to Bowen TV-ball; but that analysis fails in the noncompact case, when 
trajectories visit the cusp. We now discuss the general case. 

Lemma 4.5. For every M > 1 there exists a finite partition £fi of X such that for 
every k £ (0, 1) and every N , "most elements of the refinement V n =— JV T n ore 
controlled by Bowen N-balls": 

There exists a set X' C X so that: 

- X' is a union of S\, . . . ,Se 6 Vn=-jv T~ n ; 

- Each such Sj is contained in a union of at most 3 K ( 2Ar +i) man y Bowen 
N-balls; 

- n(X') > 1 — 2/j,(X>m)K' f or every invariant probability measure fi; 

For a given /i the choice of £P can be made such that the boundaries of all sets of 
2? have zero measure. 

Proof. We define & = {Q,P\, P k } where Q — X> M and {Pi, ... , P k } is a 
measurable partition of X < ^ whose elements have diameter less than r\ where r\ is 
small enough in comparison to the injcctivity radius of X < m (in the same sense as 
in the discussion in Section 4.2). 
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Note that the boundary of Q is a null set for every probability measure fi that 
is invariant under the geodesic flow. This is because every trajectory hits the 
boundary of Q in a countable set. Also, given \x we can find for every point x G X < m 
an e < rj/2 so that the boundary has measure zero. Applying compactness we 
construct Pi, . . . , P). from the algebra generated by finitely many such balls. 

We claim that S G &n — V^L-jv T~~ n 2? has the property that any two points 
x,y G S satisfy 

T n x G X <M & T n y G X <M for n G [-N, N] 

and 

d(T™.T, T n y) < r, whenever T n x, T n y G A <M and 71 G [-N, TV]. 

Therefore, the average f(x) = 2 n+\ H n =-iV ^-x >M (T n x) ^ s constant on sets of 
We define 

X' = {xe T~ N X <M : f{x) < «}. 

If \x is an invariant probability measure, invariance implies J f(x) d/i — /i(A>m) 
and so fi({x : f(x) > k}) < /i(X>M)« _1 - Therefore, X' has measure fi(X') > 
1 - pb(X> M ) - ii{X>m)k~ x - 

Consider now an element S G &n with S C X'. After taking the image of 5 
under T we have for any x, y G S' = T N S that 

^ 2N 

(46 ) ^ X < M ' SATT^ 1 — ^^^^ 

^ ' n— 

d(T n x,T n y) < n whenever T n x,T n y G A <M and n G [0, 2A]. 

Let F = {ne [0, 2A] : T n S' C A>^/}. We can now show inductively that for every 
n G [0, 27V] the set S' is contained in a union of 3l[ <"- 1 ] nv l many sets of the form 

xB^ e - n B^ A where x G S". 

We will refer to these sets as forward Bowen n-balls and to x as its center. For 
n = we have nothing to show (for notice that we allowed a bigger radius in the 
subgroups U + and U~ A). Suppose the claim holds for some n and let x G S' be 
a center of one of the forward Bowen n-balls. If T n+1 x G A <M then T n+1 S' C P, 
for i > 1 and it follows easily that any point y = xu + g G S' with ?i + G 7?^ e _„ 

and 5 G 73^, A satisfies u + G ( assum i n g again that ij is small enough in 

comparison with the injectivity radius). If T n+1 x £ X>m then we can cover the 
forward Bowen n-ball by 3 forward Bowen (n + l)-balls. 

Recall that for S C X' we have |V| < kN and so by taking the preimages of 
S' = T N S and the forward Bowen 2A-balls obtained the lemma follows. 

□ 

To prove Theorem 4.2 it remains to establish the following lemma and combine 
it with Lemma 4.4 and Theorem 4.1. 

Lemma 4.6. A weak* limit /i of a subsequence of the invariant probability measures 
fii as in Theorem 4.2 has maximal entropy h^iT) = 1. 

Proof. Let 2? be as in Lemma 4.5. Set TVj = \— log 5,] and define 
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We wish to show that H^&Ni) is large by using Lemma 4.5 and assumption (2). 
Let k = [i^X^m) 1 ^ 2 for some weak* limit fj, and define as in Lemma 4.5 using 
N = N,. 

For any S G with S C Xi there exists a cover of S consisting of < 3 K ( 2Jv i+ 1 ) 
many Bowen A^-balls; so there is a partition £%(S) of S into < 3«( 2Ar i+ 1 ) se ts, each 
a subset of a Bowen A^-ball. We define the partition Qi as the partition consisting 
of all S G 9> Ni with 5 C I \ I, and all elements of M{S) for any S C Xi. It 
follows that 

(4.7) H IH {Q i \& N J = V w (5)fl- /li | s (e i )</ C (2JV 4 + l)bg3. 

Also since Qi is a finer partition than ^jv^ we have 

(4.8) fl^(fii) = H^Qt V^ Ni ) = H^{3? Ni ) + H^Qil&Nj, 

which together with (4.7) indicates that we wish to show that H^.(Qi) is large. 

Here we will use the assumption (2) from Theorem 4.2; but the elements of Qi 
that lie outside can be irregularly shaped, requiring a further estimate: 

(4.9) H^Qi) > H^(Qi\{Xi,X \ X^) > ^(X^H^iQi). 
Using (4.1) for the restriction /i,-|xj we see that 

(4-10) H^ ]Xi (Qi) > -log J2 

sea,,scXi 

By construction of Qi every S G Qi with S C Xi is a subset of a Bowen TVi-ball. 
Proceeding as in Section 4.2 it follows that 

k 

(J S x 5 c \J{(x, yen) : d(x,y) < S 2 } 
seQi,scXi i=i 

where k <C e Ni and ai, . . . , a* G B^(l) are chosen to be £,-dense. Together with 
assumption (2) of Theorem 4.2 this shows 

seQi,scx z 

Let C e be the implicit constant here, that is to say, 

seQi,scx z 

Then, taking into account (4.9)-(4.10), 

H^(Qi) > 2iM(X i )]ogiM{X i )-iM(X i )logC c + iM(Xi)(2-5e)N i . 

Here the first two terms are bounded, so for large enough i 

H^{Qi) > (Mi(Xi)(2 -6e)Ni 

> {l-2K- 1 fH (X> M ))(2-6e)N i 

where we also used the estimate for Xi in Lemma 4.5. Combining this with (4.8) 
and (4.7) we get 

£T W ( \/ T~ n &\ > (1 - 2 K - 1 [ M (X> M ))(2 - &e)Ni - O(nNi). 
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Now fix some integer No > 1. Using subadditivity of entropy we have for any large 
enough i that 

H ^ ( V T ~ n ^ ) ^ C 1 - 2k~ 1 IH(X> m ))(2 - 6e)N - O(kN ) ~ eN . 

\i=-N ' 

This is now a statement involving only finitely many test function, namely the char- 
acteristic functions of all elements of \fn=~N T~ n and of X>m- Since there is 
no escape of mass by Lemma 4.4 and since we can assume without loss of generality 
that all boundaries have zero measure for the weak* limit y, by Lemma 4.5, we get 
the same estimate for fi. Dividing by 2 No and letting Nq now go to infinity we 
arrive at 

K{T) > (1 - 2fi(X> M ) 1/2 )(l - 3c) - 0(MX> M ) 1/2 ) - e 

for any M > 1 and e > 0. 

Since /j,(X>m) can be made arbitrarily small, it follows that hn(T) > 1, i.e. T 
has maximal entropy. □ 

5. Trajectories spending time high in the cusp, and a proof of 

Proposition 4.3. 

Apart from the characterization of the Haar measure as the unique measure 
of maximal entropy in Theorem 4.1 the main technical estimate needed to prove 
Theorem 4.2 is Proposition 4.3. We recall that this proposition states that the set 

Z(V) = | a; € T N X <M n T- N X <M : for all n € [-N, N] we have 

T n (x) € X>m ^uG v| 

can be covered by -Cm e 2 ^"^^ Bowen iV-balls. 

In addition to proving this, we shall also prove here the promised purely ergodic 
formulation of "high entropy inhibits escape of mass," namely: 

Theorem 5.1. LetT be the time- one-map for the geodesic flow. There exists some 
Mq with the property that 

h (T\ <r i J. lo & l °S M _ K x >m) 

for any invariant probability measure fi on X = SL(2,Z)\SL(2,R) for the geodesic 
flow and any M > M . In particular, for a sequence of T -invariant probability 
measures fii with entropies h^fT) > c, any weak* limit [i satisfies fJ-(X) > 2c— 1. 



Remark 5.2. Roughly speaking 1/2 is the critical point for Theorem 5.1 because 
the "upward" and "downward" parts of a trajectory, that goes high in the cusp, are 
strongly related to each other. In fact, in the case of a p-adic flow this phenomenon 
is easy to explain. 

We consider another dynamical system of similar flavor: here the space will be 4 
Y = PGL 2 (Z[l/p])\PGL 2 (M) x PGL 2 (Q P ) 



Tor technical reasons, it is preferable to use PGL2 here rather than SL2. 
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and the action will be by multiplication on the right of the PGL 2 (Q p )-component 

by a p = \ Let M < PGL 2 (M) x PGL 2 (Q p ) be the product of P0 2 (M) and 

the group of diagonal matrices in PGL 2 (Z p ). There is a natural right M-invariant 
projection tt : Y — > PSL 2 (Z)\H , and on this latter space we have the Heckc 
correspondence which attaches to a point z € PSL 2 (Z)\H a set T p (z) of p + 1 new 
points, namely if z G H is a representative of z then 

(5.1) T p (z) = PSL 2 (Z)\ {pz, zjp, (z + l)/p, ...,(z+P- 1)M • 

The space Y/M can be identified with the set of infinite sequences ...,?/„ i , j/o > Vi > • • ■ 
with yi £ T p (j/i_i) \ {j/i_ 2 }, and under this identification multiplication by a p in 
the p-dircction becomes simply the shift action. This in particular shows that mul- 
tiplication by a p on Y/M (or, with a bit more effort on Y) has entropy < logp, and 
just like in our case this maximum is attained for the Haar measure on Y. From 
(5.1) it is clear that if y G PSL(2,Z)\H is high up in the cusp, precisely 1 of its 
Tp-points will be higher in the cusp, and p of these points would be lower then y 
in the cusp. Therefore if . . . yo, yi, . . . are a sequence of points of PSL(2, Z)\H 
as above and if yk are high up in the cusp for some contiguous range of fc's, say 
n < k < m, then in this range given the value of yk there is only one possible way of 
choosing yu+i so that it is higher than yk, and since by assumption yk+2 ^ Uk once 
yk+i is lower than y^,, the point yk+2 being in T p {yk+\) but excluded from being 
yk which is unique point in T p (yk+i) higher than yk+i must be lower than + 1. 
Hence if yk+i is lower than yk for some fc in the above range, then yk'+i must be 
lower then yk> for all k' in the range k < k' < m. From the above discussion it 
follows that while the trajectory is high up in the cusp, we have a choice of which 
subsequent point to choose only half of the time, hence the factor |. 

5.1. Proof of Proposition 4.3: the number of possible sets V. The easiest 
part of Proposition 4.3 is the final assertion, i.e. if we write 



N 

}m,n 



V T u {X>m, X<m}- 

n=-N 

2 log log M 

then the above partition Qm^n has <^m e lo s M many elements. 

We make use of the fundamental domain C PSL 2 (K) from §1.3; the geo- 
desic flow X corresponds to following the geodesic determined by (z,v) until the 
boundary of the fundamental region is reached, at which point one applies either 

^ to shift the geodesic horizontally or ^ to reflect on the bottom 

boundary of the fundamental region. 

The basic point in the proof is that if x £ X satisfies ht(x) > M, then ht(T n x) > 
1 so long as n < [2 log M\ , i.e. one needs at least [2 log M\ steps to reach points 
of height less than 1. 

Therefore, in a time interval of length 2 [2 log M\ there can be only one stretch 
of times for which the points on the orbit are of height at least M . In other words 
the possible starting and end points of that time interval completely determine an 
element of Qm.\2\o^m\ which therefore has at most <C log 2 M, say < cq log 2 M, 
many elements. To obtain the lemma we note that Qm,n can be obtained by 
taking refinements of L^TEgA^+iJ — 4iogM-i man y images and pre-images of 
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Qm, |2 log MJ and at most 2[21ogMJ many of {X> M ,X <M }- We get that Qm,n 

n 2iV 2 log log M » r 

has size -Cm (colog Af) 41 °s M - 1 , which is at most e lo ^ M once M is large 
enough. 

5.2. Proof of Proposition 4.3: covering Z(V) by Bowen balls. Write a = 
fei \ 

_i J, so that T(x) = xa. Since X < m has compact closure, it suffices to 
\ e i/ 

restrict ourselves to a neighborhood O of a point xq € X<m- By taking the image 
under T N it also suffices to study the forward orbit as follows. We will show that 
for the set V C [0, N — 1] picked, the set 

z+ = {ieOn T- N x <M ■■ 

for all n G [0, N - 1] we have r(s) e X> M ^ n e vj 
can be covered by <Cm 2^" 5 1^1 forward-Bowen A^-balls xB~^ where 

N-l 

B+ = f| a- n B°o n . 
We may assume that the neighborhood we will consider is of the form 

o= Xo bv; 2 b^ 2 a 

where B^ denotes the r-ball of the identity in a subgroup H < SL2 (M) , A denotes 
the diagonal subgroup, and U + resp. U~ denote the unstable and stable horocyclic 
subgroups as in Section 4.2. 

Notice that by applying T n to O we get a neighborhood of T n (x ) for which 
the J7 + -part is e n times as big while the second part is still contained B^ 2 A . By 
breaking the C/ + -part into [~e n ] sets of the form ufB^ 2 for various uf € U + we 
can write T 2 (0) as a union of [~e n ] sets of the form 

r(x„)u+^; fl -»B,yfl», 

i.e. we obtain neighborhoods of similar shape. If we take the preimage under T n of 
this set, we obtain a set contained in the forward Bowen n-ball T~ n (T n (xo)uf)B+ . 
We will be iterating this procedure, but using the information that the orbit has to 
stay above height M for a long time we will be able to cut down on the number of 
u~l e U + needed to cover Zq. 

In the proof of the claim we will use a partition of [0, N] into sub-intervals of 
two types according to the set V. Notice that as in the proof of §5.1 we can assume 
that V itself consists of intervals that are separated by 2 [2 log M\ . For otherwise 
the set Zq is empty since no orbit under T can leave X>m and return to it in a 
shorter amount of time. We enlarge every such subinterval of V by [2 log MJ on 
both sides to obtain the first type of disjoint intervals Zi, ... ,1k- At the end points 
and N we have required that x,T N (x) € X<m for all x e Zq. For this reason 
we can assume without loss of generality that all of these intervals are contained 
in [0, N]. (If this is not the case, we can enlarge the interval [0, ./V] accordingly 
and absorb the change of the desired upper estimate in the multiplicative constant 
that depends on M alone). The remainder of [0, N] we collect into the intervals 
Ji, ■ ■ ■ , Jt- 
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We will go through the time intervals and Jj in their respective order inside 
[0,7V]. At each stage we will divide any of the sets obtained earlier into [e' 1 ^]- 
or [e'^^-many sets, and in the case of Xi show that we do not have to keep all 
of them. More precisely, we assume inductively that for some K < N we have 
[0, K] = X\ U . . . U 2j U J\ U . . . U and that all points in Zq can be covered by 

< 2 e Uil + "' + l^l+*L21og 2 MJ + A(|Z 1 |+-+|X 4 |) 

many preimages under T K of sets of the form 

(5.2) T?{x )u+Bv; 2 a- K BVJ 2 A a K . 

Note that for K = N this gives the lemma since by construction \X\ | H \- \X). | = 

2fc|_21ogMj +\V\. 

For the inductive step it will be useful to assume a slightly stronger inductive 
assumption, namely that the multiplicative factor 2 is only allowed if [0, K] ends 
with the interval Jj. Therefore, notice that if the next interval is Jj+i (i.e. [0, K] 
ends with Xi) then there is not much to show. In that case we keep all of the 
|" e IJj+il] < 2e'' 7j + 1 '-many Bowen balls constructed above and obtain the claim. 

So assume now that the next time interval is Z;+ 1 = [ K + 1 , K + S] . Here we will 
make use of the geometry of geodesies that visits X>m during that subinterval. 
Pick one of the sets (5.2) obtained in the earlier step and denote it by Y. By 
definition of Zq we are only interested in points y € Y which satisfy 

T n (y)eX> M ^K + neV, 

or equivalently 

ht(y)M(T(y)),...M(T [2losMi (y)) < M, 
ht(TL 21 °s M J+ 1 (y)),...,ht(T s -L2iogMj (y ^ > M 

ht(T s -^ M i +1 (y)),...,T s (y)) < M. 

If there is no such point in Y there is nothing to show. So suppose y,y' £ Y are 
such points. We will use the above restrictions on the heights to show that if 

(5.3) y = T 2 K (x )u + u + (t)v and y' = T 2 K (x Q )u + u + (t')v' 

for u + (t),u+(t') € B^ 2 and v,v' in the conjugate of B^ 2 A , then \t - t'\ < 2~ s / 2 . 
We can draw the geodesic orbits defined by y and y' in the upper half model of the 
hyperbolic plane and relate the conditions on y, y' to geometric information about 
these geodesies. We choose the lifting of the paths in such a way that the time 
interval for which the height is above M becomes the part of the geodesic where 
the imaginary part is above M 2 . 

For the translation of the properties we will use the following observation: For 
two points z\ , zi € H on a geodesic line that are either both on the upwards part 
or both on the downwards part of the corresponding semi-circle their hyperbolic 
distance satisfies 

(5.4) |logIm(zi) -logIm(z 2 )| <d(z 1) z 2 ) < \ loglm(zi) - logIm(z 2 )| + 1. 

The lower bound actually gives the shortest distance between points with imaginary 
part Im(zi) and points with imaginary part Im^)- The upper bound gives the 
length of a path that first connects the point lower down, say zi, to the point z' 
immediately above with imaginary part Im(z 2 ) and then moves horizontally to a 
point that is Im(z 2 ) far to the left or right of z' towards z 2 . For two points zi,z 2 
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on the upwards or downwards part of a semi-circle this path actually goes through 

Applying the lower bound in (5.4) to the points corresponding to 

y andT 2 L 21ogM J +1 (,) 

whose hyperbolic distance is [2 log M\ + 1 we sec that Im(y) 1 (where in a slight 
abuse of notation we identify y with the lifted point in H). Similarly, we get from 
the upper bound for y and xjj 2108 ^ (y) that Im(y) -C 1. Similar estimates hold for 
T£(y),y' and T 2 s (y'). 

We assume that the points y, y' are lifted in such a way that Re(y) <E [—1/2, 1/2] 
and such that y 1 is close to y. Denote by a_, a + £ R the backwards and forward 
limit points of the geodesic defined by y on the boundary of H and similarly by 
a'_,a' + the endpoints of the geodesic for y' . Then |a_| < 2 + \ since the lifting 
of the point y was chosen such that the times of height > M in X correspond to 
imaginary part > M 2 . For y' this implies for small enough r\ that \a'_ \ < 3. 

Let R = \\oc+ — 1 be the radius of the half circle defined by y and define R' 
similarly for y' . Then the above shows R <C |a + | <C R once M and so R are large 
enough to make a_ negligible in comparison to a + . Similarly R' <C <C R '■ 

Applying (5.4) twice, once for y and the point z on the same geodesic with 
imaginary part R, and once for z and T 2 {y) we get 

(5.5) \S- 2 log i?| < 1 and similarly \S - 21ogi?'| < 1. 

Therefore, R < R' < R and so \a+\ < \a' + \ < |a+|. 

Suppose g = ^ e SL(2,M) defines y — T 2 (x )u + u + (t)v in the sense that 

the natural action of g maps the upwards vector at i to the vector associated to 
y for the lifting considered above. Then a+ = g(oo) = - and a_ = g(0) = 4. 
Similarly, suppose g' defines y' = T 2 K (xo)u + u + (t')v' such that a' + — g'{oo). Using 
this notation we summarize what we already know about these matrices 

max(H,|&|,|c|,|d|) < 1, 
i?« |o+| = |°| < R, 

R < |a' + | < i?, and 

Here the first estimate follows since we know roughly the position of the lift cor- 
responding to y which means that g belongs to a compact subset of SL(2,K). We 
claim the above implies that 

(5.7) 1 < 1 < \a\, and |c| < \a\R~ 1 < R' 1 . 

The first estimate follows since |6| <C |d| by the last estimate in (5.6) and since 
g £ SL(2,R) belongs to a compact subset so that not both b and d are small. The 
second claim follows similarly from the second estimate in (5.6). 

To simplify the following calculation we would like to remove the elements v,v' 
(as in (5.3)) from our consideration — but to do this we need to see how this affects 
the above statements. Recall first that v,v' £ A and so v(oo) — u'(oo) = oo. 
Therefore, the first three estimates above remain unaffected when changing g resp. 
g' on the right by v^ 1 , (v 1 )" 1 . Moreover, we have |w _1 (0)| -C i] and so for small 



< 1. 
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enough r) that 1 <§C \d\ <C |cw _1 (0)+d| which implies |gv -1 (0)| <C 1. In other words, 
none of the estimates in (5.6) are affected (apart from possibly the values of the 
implicit constants) by the proposed transition from g to gv~ x resp. g' to <7'(i>') _1 
and we can assume v = v' = e. 

Comparing the definitions of y and y' we get g' = gu + (t)~ 1 u + (t'). Therefore, 

+ b _ a + b(t' - t) 



y' + =g'(oc) = (gu + (t'-t))(oc) = 



+ d c + d(t' - t) ' 



Since 1 <C |o|, u + (t),u + (t') e B^, 2 , and so \t'—t\ <C n we can simplify the numerator 
and obtain together with the third estimate in (5.6) that for small enough r\ > 

R < I tt-. -r I < i?, 

I c + d(<'-t) 1 

or equivalently 

FT 1 < |c+d(f'-t)| < iT 1 . 
Since |c| < Rr 1 and 1 < |d| by (5.7) this implies the estimate \ f — t\ < Now 
recall from (5.5) that e s / 2 <C R, so that we get the desired \t' — 1\ <C e _s / 2 

Recall next that in the current time interval Ij+i we divide -B^/ 2 T e 1 

balls of the form R^-s^^- Since all points y' that belong to V n T K (Zq) satisfy 

the estimate \t' - t\ <C e~ 5 / 2 we see that only <C e s e~ s l 2 = e s l 2 can (after the 
correct thickening along AU~) contain an element of Y n T K (Zq). This implies 
the inductive claim if we assume M is sufficiently large so that the upper bound 
we got is strictly bounded from above by I e L 2 iog a/J+s/2 
This concludes the proof of Proposition 4.3. 

5.3. Entropy and covers; proof of Theorem 5.1. For the proof of Theorem 5.1 
we need to relate entropy and covers via Bowen balls. For this we need the following 
(well known) result, which is proved in Appendix B below (for cocompact T it 
follows from Brin and A. Katok [5]). 

Lemma 5.3. Let fi be an A-invariant measure on X — T\SL(2, K). For any N > 1 
and e > let BC(N, e) be the minimal number of Bowen N -balls needed to cover 
any subset of X of measure bigger than 1 — e. Then 

, log BC(N, e) 
where T is the time-one-map of the geodesic flow. 



hfi(T) < limliminf 



Proof of Theorem 5.1. Note first that it suffices to consider ergodic measures. For 
if fi is not ergodic, we can write /i as an integral of its ergodic components \i — 
J \i t dr(t) for some probability space (T,r). Therefore, \i(X>m) = J jU*PT>m) drfjfc) 
but also /i M (T) = / /i Mt (T) dr(t) by [25, Thm. 8.4], so that the desired estimate fol- 
lows from the ergodic case. 

Suppose fj, is ergodic. To apply Lemma 5.3 we need to show that most of X 
can be covered by not too many Bowen iV-balls. Once M > 3 we have that every 
T-orbit visits X < m 1 and so (j,(X < m) > 0. By the ergodic theorem there exists for 
every e > some K > 1 such that 

K-l 

Y = [J T- k X <M satisfies fj,(Y) > 1 - e. 

fc=0 
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Moreover, also by the ergodic theorem 
1 



N 

2N + 1 ^ 

n=-N 



l x>M (T n (x))^ n(X> M ) 



as N — > oo for a.e. x E X. So for large enough N the average on the left will be 
bigger than k = ^jl{X>m) — e for any x G X\ and some subset X% C X of measure 
n(X\) > 1 — e. Clearly for any N the set 

z = x 1 n T w r n r N y 

has measure bigger than 1 — 3e. Recall that we are interested in the asymptotics 
of the minimal number of Bowen TV-balls needed to cover Z . Here N — > oo while e 
and so also K remain fixed. Since we can decompose Z into K 2 many sets of the 
form 



z' = Xi n T N - ki x <M n T- N - k2 x <M , 

it suffices to cover these, and for simplicity of notation we assume k\ = ki = 0. 
Next we split Z' into the sets Z(V) as in Proposition 4.3 for the various subsets 

2 log log M y 

V C [—N,N]. §5.1 shows that we need at most e lo s M many of these. 
Moreover, by our assumption on X\ we only need to look at sets V C [-N, N] with 
\V\ > k(2N + 1). Therefore, Proposition 4.3 gives that each of those sets Z(V) can 
be covered by <§Cm e( 1 ~~%' >2N many Bowen iV-balls. Together we see that Z can 

2 log log M fy_i/-i 

be covered by -^m,k e logM 2 Bowen AT-balls. Applying Lemma 5.3 we 

arrive at 

pV y - logAf 2 

for any e > 0, which proves the theorem. □ 



Appendix A. Representations of binary quadratic forms by ternary 

forms 

In this section we establish Proposition 3.4: 

Proposition. Let Q be an non-degenerate, integral? ternary quadratic form on 1? , 
and let 

q(x, y) = aix 2 + a 2 xy + a 3 y 2 

be a non- degenerate binary quadratic form on Z 2 . Let f 2 be the greatest square di- 
viding gcd(ai, et 2 , a^). Then the number N(q) of embeddings of (Z 2 , q) into (Z 3 , Q), 
modulo the action o/SOq(Z), is <CQ !e / max(|<Zi|, \a 2 \, | a.3 1 ) e - 

We recall that an embedding of (Z 2 , q) into (Z 3 , Q) is a linear map l : Z 2 — > Z 3 
with the property that Q(l(x)) — q(x). Such proposition was established for the 
first time by Venkov for Q — x 2 + y 2 + z 2 and extended by Pall to other quadratic 
forms [26, 21]. The proposition can be deduced from Siegel's mass formula; here we 
present a direct and elementary argument inspired by the adelic proof of Siegel's 
mass formula as outlined by Tamagawa (cf. Weil's paper [27]). 



5 I.e. Q(Z 3 ) C Z. 
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Remark A.l. - One may wonder what the dependency on Q in the above 

bound looks like; this is for instance important to obtain equidistribution 
results when Q is allowed to vary (see for instance [14, Thm. 1.8]). In 
the case where Q is a multiple of the norm form on a lattice in the space 
of trace zero elements of a quaternion algebra whose associated order is 
an Eichler order, it can be shown that the dependency is of the shape 
< e |disc(Q)| 1 ' 2+e .... It seem plausible that this holds in general. 

- The argument provides, in fact, an upper bound for the the sum over a set 
of representatives Qi, i = 1, ... ,g of the genus classes of Q, of the number 
of embeddings of (Z 2 ,^) into (Z 3 ,Qi) modulo SOq^Z). 

- Finally it is easy to see that this argument carries over without significant 
changes to quadratic forms defined over a general number field. 

A.l. Reduction to local counting problems. Fix an embedding l : (q,Z 2 ) 
(Z 3 ,Q) and let 

L := t(Z 2 ) 

be its image (if no such embedding exists, we are obviously done.) Then any other 
embedding J is (by Witt's theorem; see [22, IV. 1.5, Theorem 3]) of the form g o l, 
with g € SOq(Q). The stabilizer of l inside SOq(Q) is trivial, for any isometry 
fixing L pointwise would need to map L to itself and so must be multiplication by 
±1 on L 1 -; the condition of determinant 1 forces it to be the identity. The number 
of embeddings N(L) (up to the action of SOq(Z)) is therefore precisely the number 
of cosets g e SOq(Z)\SOq(Q) so that gL c Z 3 . 

Given a rational lattice A C Q 3 , for any prime p we denote by 

A p = A ® z Z p 

its closure inside Q 3 . Let us recall that the map 

A i ^ (A p )p 

is a bijection between the set of lattices in Q 3 and the set of sequences of lattices 
indexed by the primes (A p ) p , A p C Q 3 such that A p = Z 3 for a.e. p. Write 
K p = SOq(Z p ) for the stabilizer of Z 3 inside SOq(Q p ) and let 

SO Q (A/) = {g f = (g p ) p , g p G SO Q (Q p ), g p G SO Q (Z p ) for a.e. p}; 

the above bijection induces an action of SOq(A/) on the set of rational lattices: 

g f .A o 3/.(A p )p := {g p A p ) p . 

Remark A. 2. The group SOq(A^) is the group of finite adeles of SOq. The 
SOg(A/)-orbit of a lattice A G Q 3 under this action is called the Q-genus of A. 
We will not need much of this terminology or discuss further properties of adelic 
groups here. 

The group SOq(Q) embeds diagonally into SOq(A/). Now the stabilizer of Z 3 in 
SO Q (A/) is K f = n p SO Q (Z p ) and since ^ / nSO Q (Q) = SO Q (Z), SO Q (Z)\SO Q (Q) 
injects into Kf\SOQ(Af). 
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Consequently, letting L p = L®z Z p be the closure of L inside Z?„ we have 

N(L) < \{g f G K f \SO Q (A f ) : g f .L c Z 3 }| 

< [] \<Sp G SO Q (Z p )\SO Q (Q p ) : g p .L p C Z^}| 
p 

= nil.9p G SO Q (Q p )/SO Q (Z p ) : L p C 5p Z 3 }| =]jN(L p ). 

p p 

with 

N(L P ) = |{ ffp G SO Q (Q p )/K p : L p C = |{A G SO Q (Q p ).Z 3 : L p C A}| 

being the number of lattices in Q 3 , within the Q-isometry class of Z 3 that contain 
L p . We have proven that 

N(L)<l[N(L p ) 

p 

and thus have reduced our counting problem to a collection of local counting prob- 
lems (as we will see below N(L p ) — 1 for a.e. p); a more careful analysis of what 
we have said so far is very closely related to the proof of the mass formula. In the 
present paper, however, we need only upper bounds. 

A. 2. The anisotropic case and a reduction step. We first introduce some 
notations. We denote by 

(x,x'> =Q(x + x')-Q(x)-Q(x') 

the bilinear form associated with Q; so (x, x) = 2Q(x). The discriminant of Q is 
set to be 

disc(Q) = det((x l ,x J -)) ij < 3 

for {xi, X2, X3} any basis of Z 3 . Since Q is integral (Z 3 ,Z 3 ) C Z, so disc(Q) is a 
non-zero integer. 

We notice first that if Q does not represent nontrivially over Q p (i.e. is 
anisotropic over Q p ), then SOq(Q p ) is compact and 

(A.l) N(L p ) < [SO Q (Q p ) : SO Q (Z p )] « Q 1 

This (favorable) situation can occur only if p divides disc(Q). 

We suppose now that Q is isotropic over Q p for some prime p | 2disc(Q), we will 
reduce the problem of bounding N(L p ) to the case where the integral quadratic form 
is given by Q(x, y, z) — xy + z 2 . We note that disc(x?/ + z 2 ) = 2. This reduction 
comes with the cost that we also have to replace q by a different quadratic form 
q = up mp q with u G Z* and m p > 0. However, we only have to make this change 
for p I 2disc(Q) and m p will only depend on Q. Using these facts we will see in 
Subsection A. 7 that the bound for the number of representations of q' by xy + z 2 
will suffice for the proof of Proposition 3.4. 

We claim that there exists a basis of Q 3 over Q p so that the quadratic form 
Q with respect to the coordinates of this basis has the form up~ e (xy + z 2 ) for 
some u € Z* and £ G {0,1}. Indeed as Q is isotropic, there exists a hyperbolic 
plane in Q 3 . Complementing the basis of the hyperbolic plane with a vector of the 
orthogonal complement we arrive at a basis so that Q has the form xy + up~ t z 2 
with u G Z* and < 6 Z. If necessary we may replace the last basis vector by a 
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multiple and can ensure that I £ {0, 1}. Similarly we may divide the first basis 
vector by up~ l and arrive at the claim. 

Let A be the Z p -lattice in Qp 1 spanned by the above basis. There exists some k 
(depending only on A) so that p k Z z v C A. Let i : (Z 2 , q) — > (Z^, Q) be an embedding 
of q. Then p k l : (Z 2 p ,p 2k q) -> (A, Q) and finally 

p k i : (Z 2 , u-y^g) ->■ (A, u~VQ) ~ (Z|, + z 2 ) 

are also embeddings of quadratic lattices. We set m p = 2k+£ and q' = vT 1 p mp q and 
obtain that there is an injection from the set of embeddings u : (Z 2 , q) — > (Zp 1 , Q) 
to the embeddings t! : (Z 2 , q') — > (Z p , xy + z 2 ). 

A. 3. The case of an unramined lattice. The previous section reduces the proof 
of Proposition 3.4 to the problem of finding an upper bound for N(L p ) where we 
may assume that either p j 2disc(Q) or that Q(x,y,z) — xy + z 2 . This will be 
done in the following two local counting lemmas which depend on whether p = 2 
or p > 2: 

Recall that for p > 2 any quadratic form q on some rank two Z p -lattice L taking 
value in Z p may be written, in a suitable basis, in the form 

(A. 2) q(xe 1 + ye 2 ) = up a x 2 + vp b y 2 , u,v &Z*, 0<a<be Z> . 

To see this take an element e\ € L such that the valuation of <?(ei) is minimal 
and then take the orthogonal complement of e\, cf. [7, Sect. 8.3]. We shall call the 
integers a < b the invariants of the quadratic form (e.g. the invariants of x 2 + 5y 2 
over Z5 are (0, 1)). This is a kind of p-adic analogue of the notion of successive 
minima. The invariants determine the quadratic form over Z p - up to isometry - 
up to O(l) possibilities. We will prove the following lemma. 

Lemma A. 3. Let p > 2, let Q be an isotropic quadratic form over Q p so that 
p \ disc(Q). Let L C A be a rank 2-sublattice such that Q|l has invariants (a,b), 
then 

N(L; A) := |{A' e SO Q (Q p ).A : L C A'}| « (b + l) 2 p^ 
where the implied constant is absolute. Moreover, if (a,b) = (0,0), N(L;A) = 1 

In the 2-adic case, any quadratic form q on some rank 2 Z2-lattice L taking value 
in Z2 may be written, in a suitable basis either (cf. [16, Lemma 2.1] and [7, Sect. 
8.4]) in the form 

(A.3) q(xei + ye 2 ) = u2 a x 2 + v2 h y 2 , ti,ueZ 2 x , < a < b e Z> , 
or in the form 

(A.4) q{xe x + ye 2 ) = u2 b x 2 + w2 a xy + v2 b y 2 , u, v, w € Z* , < a < b € Z> . 

In both cases we will refer to a < b once more as the invariants of q. We have the 
following lemma. 

Lemma A.4. Consider Q(x,y,z) — xy + z 2 as a quadratic form over Q|> ^ 
A C Q2 ^ e a lattice satisfying Q(A) C Z2 and which is maximal for this property. 
Let L C A be a rank 2-sublattice such that Q\l has invariants (a,b), then 

N(L;A) < (&+l) 2 2L Q / 2 J 

where the implied constant is absolute. 
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The proof of these two lemma will use a geometric interpretation of the quotient 
SO Q (Q3)/so Q (A). 

A. 4. The Bruhat-Tits tree. Let Q be an isotropic quadratic form such that 
p \ disc(Q) or Q{x,y,z) — xy + z 2 . Note that Ao = Z p has the property that 
Q(A ) C Z p and that A is maximal for this property. We set 

t q = so q (q3)a ~ so Q (q 3 p )/K p . 

Even though this will not be used here, let us also mention that Tq is the set of all 
lattices A in Qp 1 such that 

Q(A) c Z p 

and which are maximal for this property (see [15, Cor. 4.17]). 

We will need that Tq has the structure of a (p + l)-regular tree (see [6]) in which 
A, A' are adjacent if and only if A n A' has index p in A (or equivalently in A'). 
More generally, the distance d(A, A') between two vertices A, A' satisfies 

pd(A.A') = [A : A n A'] = [A' : A n A']. 

and the geodesic between A and A' consists of all A" <E Tq satisfying An A' C A". 

Let us describe the adjacency structure on Tq more explicitly using the quadratic 
structure. Given any lattice A £ Tq, and any primitive v £ A (i.e. v ^ pA) for 
which v = v + pA £ A/(pA) is isotropic over F p (i.e. p | Q(v)) we can define a 
lattice Av £ Tq, which only depends on the line through v, as follows. Since 

(A.5) Q(av + z) = a 2 Q(v) + Q(z) + a(z, v) £ Z p 

and since the linear form (-,v) is not zero (even for p = 2), we may modify v by 
some element pz^ £ pA to ensure that p 2 | Q(v + p%o). Here the element zo is 
uniquely determined by v up to {z £ A : (z, v) = mod p}. Therefore, the lattice 

Av := -Z p (v + pz a ) + {z £ A : (z, v) = mod p] 

depends only on v, indeed only on the line through v. Using (A.5) we see quickly 
that Q(Av) C Z p . Below we will always assume that p 2 \ Q(v) and set Zq = 0. 

Under our assumptions on Q this lattice Av £ Tq is a neighbor of A, and there 
are exactly p + 1 = |P 1 (F p )| such lines, and thus every neighbor arises. 

We will use also the following simple facts: 

(1) For an isotropic v we have 

A n Av = ZpV + {z £ A : (v, z) = mod p}. 

(2) For v, v generating distinct isotropic lines the intersection 

Av n A V ' = {z £ A : (v, z) = (v', z) = mod p} = Z p w + pA 

is the preimage in A of the orthogonal subspace (F p v + FpV"') 1 - C Fp\ 

(3) Given three isotropic vectors v, v', v" generating distinct lines and assum- 
ing p > 2 we have 

Av n Av n Av» = pA. 



One establishes also the following generalization: 
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Proposition A. 5. Let A lie at the mid-point of the geodesic between A' and A" 
(i.e. there is n > 1 such that <f(A, A') = d(A, A") = ra, d(A',A") = 2nj. There 
exists a primitive v G A so £/ia£ Q(v) = 0(p") and w G A iwi/i Q(w) ^ 0(p) anc ^ 
(v,w) = Q(p n ) so that 

A n A' = {z e A, (z, v) = 0(p n )} = Z p v + Z p w + p n A 

and 

A' n A" = Z p w + p"A 

is the preimage of the non-isotropic line defined by w under the projection A t— > 
A/p n A. Moreover, for m < n, let A' m be the lattice on the segment [A, A'] at 
distance m from A, then 

A n A' m = {zeA, (z, v) = Q(p m )} = Z p v + Z p w + p m A D A n A'. 

A. 5. Proof of Lemma A. 3. Let p > 2 and Q be as in the lemma. Define 

U{L) := {A e Tq, L c A} c 7q, N(L) = \TZ(L)\. 

In the notation of Lemma A. 3, N(L) = N(L; A) for any A £ Tq. 

We start by remarking that TZ(L) is connected: if A, A' both contain L, then 
L C A n A' C A" for any A" on the geodesic path between A and A'. 

Let q be as in (A. 2). Suppose 1Z(L) is non-empty and let t : (Z|, g) (A, Q) be 
an isometric embedding with image L = t(Zp) and let ei = 0), e2 = t(0, 1) so 

Q(e x ) = up a , Q(e 2 ) = vp b , (ei, e 2 ) = 0. 

A. 5.1. T/ie case (a, 6) = (0,0). We show 1Z(L) = {A}. If not, L is also contained 
in a neighbor Av of A. However, the induced quadratic form on the span of ei, e 2 is 
nondegenerate, so this span cannot be v 1 - for an isotropic v £ A/pA. So N(L) = 1. 

A. 5. 2. TTie case a = 0, 6 > 1. Suppose that N(L) > 1. Then there is an isotropic 
v so that eT belongs to v . This shows that is a hyperbolic plane (first modulo 
p, and then since p ^ 2 also on Q^). 

In other words, e\ fl A is a rank two lattice generated by two isotropic vectors 
v,v' (which are liftings of isotropic vectors generating ei ) and then, there are 
exactly two neighboring lattices containing e±, namely Av and A^; that there are 
at most two follows from Fact (3). Pursuing this reasoning, we see that the only 
lattices that can contain e\ are the lattices 

A n := Zpp~"v + Z p ei + Zpp™v', neZ 

(which is a geodesic in the tree determined by ei). 

Let us now see that for n > b, A±2 n does not contain e 2 , which will show that 
N(L) < 46 + 3. Suppose e 2 £ A„, then 

e 2 £ A n A 2 „ = Z p ei + p n A n 

write e 2 = ae% + z, a £ Z p , z £ p n A n we obtain 

(ei, e 2 ) = = a(mod p n ), Q(e 2 ) = vp b = a 2 = 0(mod p n ). 

This is a contradiction for n > b. 
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A. 5. 3. The case a = 1. We show N(L) < 2: Suppose that L C Av for some v. Since 
e\ G A/pA is a non-zero isotropic vector contained in v it has to be a multiple of 
v. By symmetry between A and Av, this also shows that A is the only neighbor of 
A^ which contains L. Since 1Z(L) is a connected subset of the tree, this shows that 
N(L) < 2 as claimed. 

A.5.4. The case a>2. Let 

L 1 := Z p e[ + Z p e 2 , L 2 := Z p ei + Z p e 2 , e- = ejp, i = 1,2, L x + L 2 = -L 

these are rank 2 lattices containing L, on which Q is Z p -valued with respective 
invariants (a — 2, 6), (a, 6—2) and (a — 2, b — 2). We will show that either N(L) = 1 
or 

(A.6) 7£(£) c 1Z{Li) U 1Z(L 2 ) U |J B(A', 1) 

A'GK(IL) 

where B(A' : d) = {A" e 7q, d(A',A") < d} is the ball in the tree of radius d 
centered at A'; it has cardinality 1 + (p + l)^r - C 1 + pP d - 

Here is the proof of (A.6). Let A € 1Z(L). If ei 6 pA or e 2 G pA, then 
A e TZ(Li) U TZ(L 2 ). So suppose now ei,e 2 G A are both primitive vectors. By 
assumption, we have for i = 1,2 (since Q(ej) = 0(mod p)) that el is a non-zero 
isotropic vector. Since (ei,e 2 ) = 0, eT and e2 have to be co-linear; otherwise the 
induced form on the reduction A would be identically zero on a plane. Now A gl 
contains both L\ and L 2 ; so it belongs to 1Z(^L). Thus A is at distance at most 1 
from K( l L). 

Let us now see how to conclude the proof of Lemma A. 3: for r, s £ N, let 

L r ^ s := Z p p~ r e 1 + Z p p~ s e 2 . 

Q takes integral values on L rs for r < [a/2\, s < [b/2\. In this notation (A.6) 
states 

K(L 0fi ) cft(L 1>0 )Uft(Lo,i)U |J B(A',1) 

A'eK(ii.i) 

We can now apply (A.6) again to each of the terms on the right. With each 
application r or s or both increase by 1. In the latter case we obtain that the 
previous lattice A' g lZ(L r ^ s ) (to which (A.6) was applied) is at distance 1 from the 
new lattice A" G lZ(L r+ i, s+ i). Also note that in the latter case both a and b are 
reduced by 2, so that this case can only happen < [a/2\ many times. Therefore, 
induction on a + b shows that 

K(L) = 1l{L 0fi ) c \J{B(L la/2Usi [a/2\),B(L rW2l ,[a/2\) : < r,s < [b/2\}. 

Each L' = L^ a / 2 ^ s resp. L' = L r ^y b / 2 ^ has invariants (0, b') or (1, b') with b' < b and 
by the previous sections N(L') = 0(b + 1) in all cases. Consequently 

^)«E E |S(A',La/2j)|«(fe+l)V a/2] . 

L' A'GK(L') 



□ 
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A. 6. Proof of Lemma A. 4. Recall that we assume that Q(x,y,z) — xy + z 2 . 
Note that (1, 0, 0), (0, 1, 0) and (—1, 1, 1) are three isotropic vectors that are linearly 
independent modulo 2, which define the neighbors of Z|. For every pair /i,/2 of 
these vectors we can find a third vector f% G Z| so that Q(xfi+yf2 + zfs) = xy+z 2 . 
Of the four non-zero non-isotropic vectors modulo 2 the vector k = (0, 0, 1) is 
special, it is the only element in the kernel of (•,•) modulo 2 and also satisfies 
k = fs modulo 2 for any basis (/i, f$) as above. Below we will always use the 
letter k to denote the corresponding element in the lattice A/2A. 

A. 6.1. The diagonal case (A. 3). Suppose that in a suitable basis q takes the form 
(A. 3). This situation is similar to the proof of Lemma A. 3. We only discuss the 
details where the two proofs differ. 

A. 6. 2. The case (a,b) = (0,0). Wc claim that A G 7l(L) has at most one neighbor 
in 1Z(L). If one of §T or e^ is not equal to k, then wc claim that 7Z(L) contains at 
most one neighbor of A. To see this suppose ej ^ k and L C A v (1 Ay'- Then by 
Fact (2) L is contained modulo 2 in the common kernel of (-,v) and (•, u'), which is 
one-dimensional and actually equal to the span of k — a contradiction. Therefore, 
Lc An Av for at most one neighbhor Av as claimed. 

So suppose ei = e2 = k and w G A is such that Q(xe\ + y(e\ + 2w)) = ux 2 + vy 2 
as in (A. 3). Since we also have 

Q(xei + y(ei + 2w)) = x 2 Q(e 1 ) + y 2 Q{e 1 + 2w) + xy(2Q{ ei ) + 2{e 1 ,w)) 

and 2 | (e 1: w), it follows that Q(xe\ + y{e\ + 2w)) is not as in (A. 3). So we have 
seen that in all possible cases we have at most one neighbor of A in 1Z(L). However, 
this shows N(L) < 2 for (a, b) = (0, 0). 

A. 6. 3. The case a — and b > 1. We claim that the main difference between the 
case of p — 2 and p > 2 lies in this case. Here we will see that TZ(L) is only 
contained in the set of elements of distance one to points on a geodesic. This is 
caused by the fact that if el = k and ~e~2 — 0, then 1Z(L) contains all neighbors of A 
due to Fact (1) and since k is orthogonal to all three nonzero isotropic vectors in 
A/2A. 

On the other hand, we have already seen above (in the case a = 0, b = 0) that if 
e! k then only one neighbor of A can be in 1Z(L). To prove that 1Z(L) consists 
of points of distance one from a geodesic we only have to show that if = k, 
then for at least one neighbor A' of A we have eT 7^ k' where k' e A'/2A' is the 
corresponding special vector for A'. This follows if we can find some vector w G A' 
with (eT, w) ^ 0. 

To see this we simplify the notation and assume without loss of generality A = Z|. 
Let ei = (a,/3,7) so that (ei, (1,0,0)) = /3, (ei, (0,1,0)) = a, and (ei, (0,0,1)) = 
27. Since eT 7^ 0, one quickly sees that one of these inner products is not divisible 
by 4. Without loss of generality we may assume 4 \ /3. Now consider the neighbor 
A' = \1 2 x 2Z 2 x Z 2 of A. Then w = (|, 0, Q) G A' satisfies {e u w) = \fi ^ (mod 
2). Hence as claimed, ei ^ k' and so only one neighbor of A', namely A itself, can 
belong to 1Z(L). 

It follows that there exists a line segment / C TZ-(L) in a geodesic in T(Q) so that 
1Z(L) C Uag/ -^(A> !)• Arguing as in Subsection A. 5. 2 we can bound the length of 
/ in terms of b and obtain N(L) < 3(46 + 3). 
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A. 6. 4. The case a > 1. The arguments for p > 2 carry over to the remaining cases. 

A. 6. 5. The non-diagonal case (A. 4). So supppose now q is represented by the lat- 
tice L = Z 2 ei + Z 2 e 2 C A with 

Q(ei) = u2 b , Q(e 2 ) = v2 b , (e 1 ,e 2 ) = w2 a , u, v, w e Z* , < a < b. 

A. 6. 6. The case a = 0. If (a, 6) = (0, 0), then e% and e~2 are linearly independent in 
A/2A since otherwise w = (ei, e 2 ) = (mod 2). Also note that the plane generated 
by eT and eij does not contain any isotropic vector. However, this implies that 
e\ , e 2 cannot be both contained in any A^ for then v would contain eT, e^, v three 
linearly independent vectors. 

If now (a, 6) = (0, 6 > 1), ej and §2 are two linearly independent isotropic vectors 
and so e\ can only be contained in A^j-. Similarly, e 2 is only contained in A^-. So 
L cannot be contained in any neighbor of A. 

In conclusion for a = we have 

N(L) = 1. 

A. 6. 7. The case a = 1. In that case at least one of the vectors eT and ei must be a 
non-zero isotropic vector, for otherwise a > 2. Suppose el 7^ 0. Then el € Av only 
for el = v. Therefore, L can only have one neighbor in 1Z(L) and so N(L) < 2. 

A. 6. 8. The case a > 2. We consider again the two rank 2 lattices 

L x := Z 2 e' 1 + Z 2 e 2 , L 2 := Z 2 ei + Z 2 e' 2 , e- = ej/2 

which contain L and on which Q is Z 2 -valued: 

Q(ei) - u2 b - 2 , Q{e' 2 ) = v2 b - 2 , (e' l7 e 2 ) = (e u e' 2 ) = w2 a ~ x . 

We describe now the type and the invariants of L\ — by symmetry L 2 behaves 
the same way. 

If a = b we may solve the equation in (3 € ZJ 

0= (e 2 + /3ei,ei) = W2"- 1 + ^2^ 

and so Qiz^ is of diagonal form (A. 3) in the basis {e 2 + /3e[, e[}. Furthermore, since 

(ea + M, e 2 + $e\) = 2Q{e 2 + pe[) = v2 b+1 + ^2^ 

it has invariants (a — 2, — 2). 

If a < b, take f3 = 2 b ~ a : in the basis {e 2 + f3e[, e[}, Q\l x takes the non-diagonal 
form (A. 4) with (a', b') = (a — 1,6 — 2). Finally Q\l x +Lo = Q\L/2 takes the form 
(A.4) with (a", 6") = (a -2, 6 -2). 

We then conclude exactly as in §A.5.4 by proving that either N(L) = 1 or (A. 6) 
holds. This implies once more the desired bound. 

A. 7. Proof of Proposition 3.4. We now show how the previous subsections com- 
bine to the proof of Proposition 3.4. 

Recall that we are bounding the number of representations N(L) of the quadratic 
form q(x, y) = a\X 2 + a 2 xy + a^y 2 by the ternary quadratic form Q up to SOq(Z). 
For any p let us write a p and b p for the invariants of q over Z p as in Section A. 3. 
Let f 2 \ gcd(ai, a 2 , 03) be the greatest common square divisor of the coefficients of 
q. Then a = v p (f). 
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By the discussion in Section A.1-A.2 we know that 

n(l) « n n ( l p)- 

p 

Q p— isotropic 

Also recall from Section A. 2 that for bounding N(L P ) for p|disc(Q) we may replace 
Q by xy + z 2 and q by a fixed multiple q' of q, where the factor only depends on Q. 
From this we see that Lemma A.3-A.4 also hold for p|disc(Q) for q and Q, except 
that the implicit constant depends for those primes also on Q. 

Notice that for any prime p > 2 we have a p + b p = v p (disc(q)) and a p = 
w p (gcd(ai, a 2 , 03)). For p = 2 we have W2(disc(g)) = a + b + 2 in the diagonal 
case and v 2 (disc(c7)) = 2a in the non-diagonal case. Also let c > 1 be the implied 
constant in Lemmas A. 3. Together with Lemma A.3-A.4 this gives 

7V(L)« J] c( Vp (disc( g )) + 1) V p(/) « e /max( ai , a 2 ,a 3 ) e , 

p|2disc(q) 

as desired. 



Appendix B. Entropy, Bowen balls and uniqueness of measure of 

MAXIMAL ENTROPY 

B.l. Statement of main results. We recall some notations: we work in the space 
X = T\G with G = SL 2 (IR), and let T denote the time-one- map of the geodesic 
flow, i.e. the map 

fe 1 ' 2 \ 
T : x i— > xa with a = I -1/2 J • 

We define a Bowen (N, 7y)-ball in this space to be any set of the form xBm,^ with 
x iE X and 

N 
n=-N 

(in the sections above r\ remained fixed and was omitted from the notations, but 
here it will be convenient to be able to use Bowen balls of varying 77). 

If r is cocompact, for all 77 sufficiently small, the Bowen (A^,?7)-ball xBn^ coin- 
cides with the set 

xB N , v = {y : d(T n {x), T n {y)) < 77 for all -N < n < N} . 

This is not true any more for noncompact quotients, where in general the right 
hand side can be significantly bigger than the left hand side which is the source of 
some complications. 

The following theorem was proved for compact quotients by Bowen in [4]. It 
is certainly well known also in the finite volume case, and proofs using leafwise 
measures can be found e.g. [20, Prop. 9.6] and the more recent lecture notes [12, 
Thm. 7.9]). 

Theorem B.l. Let X = r\SL 2 (M) andT : X -> X be as above. Then for any T- 
invariant probability measure v the entropy satisfies h v {T) < 1. Moreover, equality 
holds if and only if v = fix is the SL 2 (IR) -invariant probability measure on X. 
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We give here a direct proof not using leafwise measures, based on Lemma B.2 
(which is identical to Lemma 5.3 and was needed for the proofs in §4), in the spirit 
of Bowen's proof (that in itself was inspired by a proof by Adler and Weiss [1] of 
the uniqueness of measure of maximal entropy in irreducible shifts of finite type) . 

Lemma B.2. Let be an A-invariant measure on X — r\SL(2,R). Fix r\ > 
and e G (0, 1). For any N > 1 we let BC^lN, e) be the minimal number of Bowen 
(N,rj)-balls needed to cover any subset of X of measure bigger than 1 — e. Then 

(B.l) ^(r)<limhminf l0gi?C " (iV ' £) . 

It is easy to see that for any n, n' > a Bowen (N, ?y)-ball can be covered by <C 1 
Bowen (N, 7/)-bails. Therefore, 

(B.2) ]hnmtlogBCJN,e)/2N 

N— yco 

is independent of n. One can show that if /i is ergodic, equality holds in (B.l), and 
moreover that the quantity in (B.2) is independent of e. If /x is not ergodic, then 
in general equality in (B.l) fails: in this case h^(T) is the average of the entropy 
of the ergodic components of /_( and the right hand side of (B.l) gives the essential 
suprcmum of the entropies of the ergodic components of fi. We shall not need 
either fact (nor will we prove them), but will use the following related estimates for 
fi ergodic: 

Lemma B.3. Assume that /i is in addition ergodic for T . Then for any suffi- 
ciently small 7/ (depending only on X) and for any e € (0, 1) and any large enough 
N (depending on fi,e), for any e\ € (0,e) 7 if k is sufficiently large (depending 
on e\, e, N, n,n) then 

log BC n (kN, e x ) < fc(l - 2e) logBC v (N, e) + 4eNk + qk. 

Here q is some absolute constant. 

For our proof of Theorem B.l it is crucial that the second error term (qk) does 
not depend on N. Roughly speaking the lemma says, if we manage to cover some 
set of measure bigger than 1 — e by relatively few Bowen (N, r;)-balls, then a set of 
size 1 — e' can also be covered by relatively few Bowen (Nk, 7/)-balls. 

The reader may wish to look now at the proof of Theorem B.l in Subsection B.4 
to see how the above two lemmas are used in combination to imply the uniqueness 
of the measure of maximal entropy. 

B.2. Proof of Lemma B.2. In the proof we will need the notion of relative entropy 
for partitions: For two partitions = {Si, . . . , Si} and Q = {Qi, ■ ■ ■ , Q m } of a 
probability space (X, fi) the relative entropy of 2? given Q is defined by 

H^\Q) = g Mft n Qj) log M ^ 3 ° , 

and it is easy to see that it gives the following additivity of entropy 
(B.3) H^VQ) = H^(Q) + H^\Q). 

We should also use the notation 'P(x) to denote the elements of the finite or count- 
able partition V containing x. 
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Proof. Let 3? = {Q, Si, ... , Si} be a finite partition where Q is the only unbounded 
set, all boundaries dSi are null sets which satisfy additionally 

for some constant C > and all n > 0, and finally h^(T, 0P) > h^(T) — 6. Here 

hJT,&)= lim - ' 

is the expression over which one needs to takes the supremum to define h^(T). 
Such a partition exists since (i) by the general theory of entropy h^{T) can be 
approximated by h^(T,V) once V is a sufficiently fine partition, and (ii) one can 
find for every x £ X arbitrary small r > for which n((dB r (x))B^ < Ck for all 
K > (since for every x the function r t— > [i(B r (x)) is monotone increasing hence 
diffcrcntiable for a.e. r.) 

We claim that for most points x £ X (we shall quantify this presently) it holds 
that 

(B.4) ^~ N ' N \x) D xB N<2v > for rf = r/N~ 2 , 

hence if y € xB^^i, then yB^^ C ^~~ N - N \x). To show this, suppose y — xh 
&\- N ' N \(x) for h € Bjsj^i. Then for some n with \n\ < N the elements 

xa n and xha n 

belong to different elements of £P, It follows that at least one of the elements xa n 
belong to (dP)Bg ) , for some P G , 1 77, J < N. Therefore, x belongs to 

N 

(B.5) (J T n |J (dS)B%, 

n=— N set? 

which has measure less than 2(2N + 1)£Ct]N~ 2 <C iV^ 1 . This proves the above 
claim. 

Roughly speaking Bm^ has length rj in the direction of A and rje~ N along stable 
and unstable horocycle directions while B^rf has rjN~ 2 and i]N~ 2 e~ N instead. 
From this one can easily deduce that one needs at most <C N 6 many translates 
of Bn.ti' to cover -B/v,»j- Choose / > lim c _>o liminfjv^oo io s B ^( N ^) . Then for any 
e > 0, there is some large TV > 1 depending on e such that the measure of the set 
in (B.5) is less than e, and moreover such that 1 — e of the space can be covered by 
less than e 2N f many translates of the set B^^ . 

Say yiBj^^i, . . . , 2/fc-Bjv,?7' (with k < e 2N f) cover Xi c X with n(Xi) > 1 — e. If 
x G X\ is not in the union in (B.5). Since x € yjB^ v i for some j, it follows from 
(B.4) that yjB N>n , C ^~ N,N] (Vj). In other words, it follows that 1 - 2e of the 
space can be covered by e 2N $ elements of the partition ^>\.~ N ' N \, 

Let P be the union of these partition elements and let V = {P, X \ P} C a(&) 
be the associated partition. Write /is = (/i(B))^ 1 fj,\s for the normalized restriction 
of the measure /z to a Borel set B. It follows now from (B.3) that 

H^- N ^) = H^V) + H^M- N ' N] \V) 

= H„{V) + ^P)H^ P {M- N ^) + p(X \ P)H^ P {M~ N ^) 

< log 2 + 2Nf + 4eNl 
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since the entropy of a partition with K elements is at most log K. For N — > oo this 
shows that 

h„(T)-S<h ll (T,^)<f + 2e£, 

which implies the lemma since 6 and e were arbitrary. (Note that £ depends on 5 
but not on e.) □ 

B.3. Proof of Lemma B.3. We shall say a Bowen ball yBiy v is injective if the 
map g >-> yg is injective on B^^- Let 770 > be such that 2r]o is smaller than 
the length of any closed geodesic in X. An easy compactness argument shows that 
if V < Vo for an y compact F C X there is a N n so that if N > N and y G F 
the Bowen ball yB^ „ is injective. In the proof we shall also make use of shifted 
(s, t; ?7)-Bowen balls — sets of the form yB SttiV where B Sj f lV := f^|* =s a l B^a^ 1 and 
(s, f; 77) sub-Bowen balls which are simply sets of the form yB for some B C B Sit . v . 
A shifted (s, t; ?7)-Bowen ball yB s , t - v (respectively, a (s,t;r]) sub-Bowen ball yf?) is 
injective if the map g > yg is injective on B a ^ tri (or £?). We note the following 
important properties of shifted Bowen balls: 

(Bowen-1). For any s < t < r, the intersection of an injective (s,t;rf) sub-Bowen 
ball with an injective (t, r; 77) sub-Bowen ball can be covered by at most 
q injective (s,r; 77) sub-Bowen balls; 

(Bowen-2). For any s < t < r, an injective (s,t;rj) sub-Bowen ball can be covered 
by at most qe r ~ t injective (s, r; 77) sub-Bowen balls. 

Proof of claims. Both claims can easily be reduced to their special cases where 
t = and where we only consider Bowen balls of the form gB s r . n in G instead of 
injective sub-Bowen balls in X. 

For the proof of (Bowen- 1) notice that there exists some C > so that 

(B-6) giB s>0;n C B Cr,e' B Cr]i 

where denotes the r-ball around the identity in a subgroup H C SL 2 (M). 
Similarly, 

(B.7) 02-Bo, r;7) c 92Bc e -r v B% ri Bq v . 

We can now decompose each of the balls appearing on the right hand side of (B.6)- 
(B.7) into <C 1 many balls with certain smaller radius and obtain that giB s ^. ri n 
9iB§ r .„ is the union of -C 1 many sets of the form 

O = (si4< 8 «rB^ /8 «i< /8 ) n (g 2 u+B^t r/8 u^B^- 8 a 2 B^). 

where u+ G BgJ.uJ G B^ e _ r ,u^ G B^ £S ,u^ G B^,a 1: a 2 G S^. If 5 G O 
and 770 is sufficiently small so that conjugation by an element of distance C770 does 
not increase the distance to the identity significantly, it follows that O C gB/g^.^ 
which proves the first claim. 

The second claim follows similarly by splitting the set -B s .o ; r/ as in (B.6) into 
<C e r many sets of the form 

,+ uU+ „,-nU- 



o = ffittr^ e t r/8 «r^/8ai^/ 8 

r G B^ veS , and showing that foi 



with ?7 + G B^L and u G B^ veS , and showing that for g G O we have O C 
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Proof of Lemma B.3. Let r\ G (0, 770) where rjQ is as defined above, and let M 
be sufficiently large so that /i(X<m) > 1 — e/2 and similarly choose Mi so that 
n{X<Mi) > 1 — ei/2. We require that N be sufficiently large so that any (N,rf)- 
Bowen ball yB^^ intersecting X<m is injective, and we choose k\ so that a similar 
statement holds for any (k\N, ?7)-Bowen ball intersecting X<Mi- 

Let S be a collection of (N, 77)-Bowen balls of cardinality BC rj (N,e) covering a 
subset of X with [i measure at least 1 — e. Then 

Z' = {Be~:Bn X< M ^ 0} 

has /1 (Uses' ^0 — 1 — T • Let ^ = Uses' ^- By * ne pointwise ergodic theorem, 
there is a fc 2 > fci and a subset Yi c X< Ml of /i-measure > 1 — ^ so that points 
in Y\ spend most of their time in Y in the following sense: 

71 — 1 

(B.8) — hr{T"(y)) > 1 - 2e for all n > k 2 N and y G Y 1 .. 

S=—Tl 

To complete the proof of Lemma B.3 we will show that for any k > k^ there is 
a collection Si of (kN, ?7)-Bowen balls covering Yj of cardinality 

|Si| < N(2q) k BC n (N,e) k ^- 2 ^e^ k+A)N 

Let c be the implied multiplicative constant. Then for large enough q' (depending 
only on q and some absolute constants above) we have cN(2q) k e AN < e qk for all 
sufficiently large k (where the bound is allowed to depend on N). Therefore, the 
existence of Si as above will establish the lemma. 

Fix k > k 2 and let y € Yi. We partition the finite orbit {T- kN {y), . . . , T kN ~ 1 (y)} 
into the 27V finite orbits of the form {T- kN+e (y), T^- k +^ N+e (y), T^-^ N+i (y)} 
for I € {0, . . . , 2N ~ I }. By equation (B.8) there must for any y G Y\ exist some 
£(y) G {0, . . . , 2N — 1} so that 

k — 1 

1 Y l Y (T(- k+2 ^ N+e ^(y)) > 1 - 2e. 

Let L = \(1 - 2e)k] . It follows that there are < tx < t 2 ■ ■ ■ < t L < k 
with T^ k+2ti ^ N+e ^ v \y) G Y. Furthermore, there exist injective (N, ?7)-Bowen balls 
Bi, . . . , B L G S so that 

L 

i=i 

Recall that 2 has BC V (N, e) many elements. We now apply the properties (Bowen- 
I) and (Bowen-2), and we conclude that the set of all y G Y x with a given value of 
l(y) and t\,...,tL can be covered by 

« BC„(N, e )Hl-2e) + l e iNk t +2N q k+l 



injective (fc^V, ?7)-Bowen balls. Since there are at most 2N2 k choices of £(y) and 
t\, . . . ,tL we are done. □ 
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B.4. Proof of Theorem B.l. We begin with the observation that the SL(2,R)- 
invariant measure [ix on X achieves the upper bounds stated on the entropy, and 
moreover is ergodic under T. Let v ^ \ix be another T-invariant probability 
measure and without loss of generality we may assume that v is singular with 
respect to \ix (which is the case e.g. if v is also ergodic), and let 770 be as in the 
proof of Lemma B.3. 

Let / be a nonnegative, continuous, compactly supported function so that 

(B.9) J fdfi X < J dtj f{xa t )dv, 

R some real number strictly between the left-hand side and right-hand side of (B.9) 
and set 

rT 



Y T = lx : i J f{xa t )dt > R 



By construction Yt is compact, and (for e > arbitrary) by the pointwise ergodic 
theorem if T is large enough fix(Yr) < e and v{Yr) > 1 — c. In fact, if T is large 
enough, for any sufficiently large N it holds that 

(B.10) hxVtBn^) < 2e. 

Fix such a T, and chose N so that the (B.10) holds and moreover any (N, ?7o)-Bowen 
ball intersecting Yt is injective. 

Now choose a maximal collection of disjoint (N, r;o/2)-Bowen balls intersecting 
Yt- Each of these balls has fix volume ^ Vo e~ 2N (the implicit constant is indepen- 
dent of e and N). In view of (B.10), it follows that the cardinality of this collection 
is <C,, ee 2N , and by maximality the corresponding collection of (N, ?7o)-Bowen balls 
covers Yp- As v(Yt) > 1 — e we obtain BC„ {N ,e,v) <Cn ee 2Ar (note that since 
we are simultaneously discussing two measures we have added v to the notation 
BC{-)). 

Roughly speaking the above upper bound should lead to h v (T) < 1 by using 
Lemma B.2: most of the space with respect to v is covered by relatively few, namely 
< Cee 2N , Bowen (N,rj)-balls. However, as (B.l) first takes the limit as N — > 00 
this inequality is not directly implying h v (T) < 1. To overcome this we introduce 
an e' € (0, e) and will use Lemma B.3 to obtain the bound on the covering number 
for e' and kN. Indeed applying Lemma B.3 we conclude that for any e' € (0, e) if 
k is sufficiently large 

logBC^ikN, e', v) < k(l ~ 2e){2N + log(Ce)) + AekN + qk 

< fc(l - 2e)2iV + -felog(Ce) + AekN + qk = 2Nk + (q + - log(C*e)^) k, 

where we also assumed e < 1/4 and Ce < 1. Hence we obtain for any e' € (0, e) 
that 

However, for sufficiently small e the right hand side is < 1. Hence by Lemma B.2 
we get h v (T) < 1. Therefore, mx is the only probability measure on X with 
h mx (T) > 1. 
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